< Previous
Next >

: Features I Want But Will Never Have: First In A Series

What I Want: Google has a 'relatedness' algorithm that, presumably, assigns a number to every pair of web pages depending on how similar those web pages are. I'm not interested for the moment in the workings of the algorithm or how accurate it is. What I wantTM is an interface to the other end of this algorithm; I want to see which pages are least similar to other pages.

Feasibility Study That Ignores The Real Problems: The web is, for all intents and purposes, connected (I don't think there are, eg., two large groups of pages such that you can't get from one group to another via hypertext links), so even if your algorithm goes by links you can get a nonzero relatedness number for any two pages. The chaotic nature of the web would ensure that most sites would not have thousands of ties for 'least relevant site' (I think this undesirable outcome is more likely for bigger sites; standard deviation of the mean distance to a site is much smaller for larger sites: any given site is about as relevant to Yahoo as is any other site. But more complex algorithms would reduce the importance of mere link distance.)

Why I'll Never Have It: The problems are threefold: first, you probably don't have infinite precision, so thousands of sites would get rounded down to zero relevance. Second, it's a lot faster to find close nodes in a graph than it is to find far nodes, so the algorithm would have to use a lot of extra index space or take a long time to run. Third, this idea is completely useless (I could be wrong; come up with a good use for this feature and win a valueless crummy.com prize!).


Unless otherwise noted, all content licensed by Leonard Richardson
under a Creative Commons License.