Exploring a ‘Deep Web’ That Google Can’t Grasp

Monday, February 23rd, 2009

WWW's "historical" logo, created by ...
Image via Wikipedia

One day last summer, Google’s search engine trundled quietly past a milestone. It added the one trillionth address to the list of Web pages it knows about. But as impossibly big as that number may seem, it represents only a fraction of the entire Web.

Beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines.

The challenges that the major search engines face in penetrating this so-called Deep Web go a long way toward explaining why they still can’t provide satisfying answers to questions like “What’s the best fare from New York to London next Thursday?” The answers are readily available — if only the search engines knew how to find them.

Now a new breed of technologies is taking shape that will extend the reach of search engines into the Web’s hidden corners. When that happens, it will do more than just improve the quality of search results — it may ultimately reshape the way many companies do business online.

Search engines rely on programs known as crawlers (or spiders) that gather information by following the trails of hyperlinks that tie the Web together. While that approach works well for the pages that make up the surface Web, these programs have a harder time penetrating databases that are set up to respond to typed queries.

“The crawlable Web is the tip of the iceberg,” says Anand Rajaraman, co-founder of Kosmix (www.kosmix.com), a Deep Web search start-up whose investors include Jeffrey P. Bezos, chief executive of Amazon.com. Kosmix has developed software that matches searches with the databases most likely to yield relevant information, then returns an overview of the topic drawn from multiple sources.

“Most search engines try to help you find a needle in a haystack,” Mr. Rajaraman said, “but what we’re trying to do is help you explore the haystack.”

That haystack is infinitely large. With millions of databases connected to the Web, and endless possible permutations of search terms, there is simply no way for any search engine — no matter how powerful — to sift through every possible combination of data on the fly.

Read more . . .

Reblog this post [with Zemanta]

Random Posts:

Link To This Post
1. Click inside the codebox
2. Right-Click then Copy
3. Paste the HTML code into your webpage
codebox
powered by Linkubaitor

Related posts:

  1. Google Unveils News-by-Topic Service
  2. The invention machine
  3. V.C. Nation – A Post-Google Fraternity of Investors
  4. Google Options Make Masseuse a Multimillionaire
  5. Google Plans a PC Operating System

Tags: , , , , , , , ,

Leave a comment

Get Adobe Flash playerPlugin by wpburn.com wordpress themes

Innovation Search

Translator

English flagItalian flagKorean flagChinese (Simplified) flagChinese (Traditional) flagPortuguese flagGerman flagFrench flag
Spanish flagJapanese flagArabic flagRussian flagGreek flagDutch flagBulgarian flagCzech flag
Croatian flagDanish flagFinnish flagHindi flagPolish flagRomanian flagSwedish flagNorwegian flag
Catalan flagFilipino flagHebrew flagIndonesian flagLatvian flagLithuanian flagSerbian flagSlovak flag
Slovenian flagUkrainian flagVietnamese flagAlbanian flagEstonian flagGalician flagMaltese flagThai flag
Turkish flagHungarian flag      
By N2H

Categories

19 visitors online now
19 guests, 0 members
Max visitors today: 44 at 04:38 am EDT
This month: 44 at 03-18-2010 04:38 am EDT
This year: 70 at 01-17-2010 12:44 pm EST
All time: 113 at 12-03-2009 10:18 pm EST