Sunday, August 16, 2009

Search engines indexes can beat URL-shorteners induced link rot

I've been reading about the recent fiasco surrounding tr.im's initial announcement to shut down its service, and the resultant furor about the consequential link rot, reaching to the point of fingers being raised at all these URL shortening services (more coverage here, here, here, and here). Although tr.im subsequently backtracked, the whole episode has provided us a much-needed caution.

I was thinking about easy ways to prevent/solve the problem of link rot resulting from a URL-shortener shutting down its service. Among many ideas (such as Facebook/Google/Microsoft/Mozilla/Wikimedia/Yahoo launching industrial-strength URL-shortening services, that have much more guarantee of remaining alive than services from cash-crunched startups), one idea looked particularly easy and doable - search engines saving mappings of short-URLs in their indexes, for future use.

The way this idea is supposed to work is simple
  1. Currently, when Google crawls the Web (including Twitter posts), it supposedly indexes only the content (e.g., it indexes only the short-URLs present in tweets)
  2. Whenever Google encounters short-URLs (recognized by a human-built index of these services), it should execute them and save the mappings in its database
  3. These mappings will be now be available for reuse to Google Toolbar users
  4. When a users surfs the Web, Google Toolbar will actively look for any short-URLs on the current webpage (the same way it currently looks for mappable addresses, etc.)
  5. If it encounters any, it will query Google's short-URLs database to fetch its destination (and it may also be used to feed Google's index with as-yet-unindexed short-URLs, thus acting somewhat like a distributed crawler to help build Google's index)
  6. If the original service isn't available to redirect the short-URL, the Toolbar will offer to redirect the user to the correct destination
This method should work for short-URLs that are public - the ones that can be accessed and indexed by search engines. To also record mappings of short-URLs that are generated and used privately by users, the Toolbar should offer an optional feature which monitors the user's usage of URL-shorteners, and upon detection of any private usage of such a service (such as in Gmail/Hotmail), automatically saves the mapping to a user's Google Account.

Providing such a feature will help a search engine to differentiate itself from rivals in one more way. And addition of this feature to a Toolbar will make it more desirable/useful.

Update: Just read this story on Ars, about Google's possible plans to include microblogging search into its service. To more deeply understand a microblog post (and its context), it makes sense to execute any short-URL present in that post, and to analyze a digest of the destination. My current idea requires simply recording these mappings, for reuse.

Click to read my idea about a default video-player inside Flash Player

No comments:

Post a Comment