Tuesday, 19 June 2007

Understanding PageRank

It plays a central role in many of google's web search tools, based on a search algorithm and networking of thousands, probably unaware of, desktop PCs. It might even be mine for that matter, not that anybody should complain about it. It follows google's claimed philosophy that searches are designed to provide end-users with helpful and accurate results, which we can accept in general terms.
It makes use of web's vast link structure to determine the value of an individual webpage. A link from page A to page B is regarded as a vote for page B, by page A. That explains the large size of blog pages where the indicator bar in the scroll bar almost faints to obscurity. So there are a lot more links in a blog as the number of posts increase with time and the more posts the more the links. Moreover all, these little cute links from the bookmarking companies at the end of each post, would add to the number of links built over time. Finally that probably defines the effort put by each of the bloggers as painstakingly built bit by bit over time their number of links.
The next thing is about the weight attributed to the "votes" of some A pages to B pages. Namely, pages with a high pagerank, therefore "important" (the quotes are google's, as probably it feels guilty as their pledge about their loyalty to end-users is watered down a bit), weigh more heavily than other pages with lower rank. The whole thing boils down that the whole enterprise is primarily commercial. We need to make money, if we don't then the whole thing will not run at all. There won't be any web at all if the profits do not roll in, though we do need both, the money and the knowledge that WWW spreads around, and it is only hoped (as far as I could tell we can only hope) that people that run google and other services do not get greedy.
The next lines should be analysed carefully because, it determines how google suppose to serve end-users, the public, and understand their philosophy, namely the PageRank algorithm. It is claimed that they employ sophisticated text-matching techniques, (sophistry?) to find pages that are both important and relevant to your search. What can I say? Usually sophistry's attempts, are like philosophy, though not in substance but in form. The text-matching techniques most likely gives the impression the search for the end-user is relevant but whether is important or not, this is a matter for a wishing well. The end-user is the sole judge whether the search returned important results and the vague nature of criteria for relevancy, pose an enormous uncertainty value that it would be impossible for any algorithm to calculate therefore this allegation by google is an empty letter, devoid of meaning.
So the text-matching is based primarily upon the number of times a term appears on a page. So, what does that mean? A term in a page that is repeated, and it is the term that an end-user has put a query for? Can that repetition be mechanical, beyond the meaning conferred by the including passage? According to google. No. Google goes far beyond the number of times the term appears on a page. It examines dozens of aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for the query. Oh well, quite vague statement, which verges to the point to claim that this algorithm of theirs somehow has human qualities and can anticipate your mind.
The whole thing boils down that google's rise is a result of winning in a competition with strictly commercial criteria and in doing so it has neglected and continues to neglect individual enterprise an aspect which quickly has been taken up from services like technorati and the like. It is a matter of you can not have your eyes in too many pies.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.