Monday, September 1, 2008

Page Rank - My Version!

So what is 'Page Rank'?
There are a little too many answers to the question. Seems like everyone has a 'similar' version. In other terms, its the same style of cooking but a different , leading to a distinction in taste. I would not really try to define the ranking algorithm that was finely designed by the 2 Stanford University PhD students (and as one of the thought also goes, in the name of one of the designers of the algo).
I would like to have my thought on the algo added to the already existing ones, though this one is like a more generic one. I'd say that PageRank of a page(or a document), is the probability of a web searcher ending his search at the particular page looking for 'X' keywords assuming that he/she had infinite amount of time at his/her disposal.
Now the question is, how do we (or a machine) calculate that probability?
There are a lot of things that would have some relationship with this calculation. Starting off with the most heard of, 'back links'. Probability and logic have it, the more the links from other page to this page, the more are the chances of someone reaching the page. Thats logic and the probability is math.
The odds of terminating the search at a particular website also is a factor of the reliability factor of the page (or the website). The credibility is as important as anything because no one would want to call it a day with all the wrong information, would you in that position terminate it there? I'm sure 'No' is the answer (unless you have a deadline, meeting which is the priority for now as compared to reliability :P ). This is the reason why .gov, .edu, .google.co* and wikipedia are given the boost they are given while google ranks pages for your search.
Also, having spoken about the back linking of websites, there is a damping factor associated for each hop so that having traversed through 2 edges to reach a page would weigh less as compared to a single hop. The damping factor supposedly being 0.85.
Next question that would come to mind would be, what about the newbies? In that case, there's a default value (Supposedly 0.15) for them.
Those are the important components of page rank.
All said and done, we would still be concerned with the question of 'where do we start' and how do we get a back link etc rank until we first build the network? This boils down to the chicked and hen problem.. right!!! :)
Well, it is supposed to start off with default values at the start and then the process is repeated 'n' times (where 'n' is a very high number) to fine tune the ranks to get closer to their real values.

This is my version of the way the best 'generic' search engine on the planet works (and there's a lot more than this that is used for fine tuning search results and which is beyond the scope of this entry).

No comments: