Google 1 Trillion Web Pages in its Index - Still Crawling ~ My Google Online - SEO Tips
Search Engine Optimization SEO - Google, Yahoo, Msn Optimization - Madanlal Google 1 Trillion Web Pages in its Index - Still Crawling

Google 1 Trillion Web Pages in its Index - Still Crawling

Google vs Internet

We know that Internet is huge. Google started its index in 1998 by 26 million pages and by 2000 Google reached the index of around 1 billion. Over the past eight years Google have seen a lot of content and information in the web. Recently Google engineers was astonishing about the web in recent internet revolution, they proudly announces that Google systems over reached a milestone of 1 trillion (as in 1,000,000,000,000) unique URL's in its index

Google Crawling for Informations

Google engineers explain how they crawl the above mentioned dream number. They started indexing the URL's with some initial pages and then they start follow the links of each pages they indexed, then they followed the links from the new pages they indexed and so on. To be frank they have more than 1 trillion pages but they are not sure that all the pages are unique and have only one URL because some of the pages have two or more URL's. So we avoid all those URL's and pages and reached 1 trillion and we are indexing billions of unique pages and unique links every day.

Infinite Internet Web Pages

To be open minded, web is infinite so we can’t crawl all the pages but we will try our level best to crawl all the unique pages.

They also admit that all the trillion pages are not unique because in Google calendar everyday it will show up new pages but it will mostly have same content so we can't avoid such criteria. But they are very much proud that they have the most comprehensive index of any search engines, and their goal is to crawl all the world information’s and display to searchers.

Some of the important points from Googlers:

“To keep up with this volume of information, Google systems have come a long way since the first set of web data Google processed to answer queries. Back then, we did everything in batches: one workstation could compute the PageRank graph on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time. Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.
As you can see, our distributed infrastructure allows applications to efficiently traverse a link graph with many trillions of connections, or quickly sort petabytes of data, just to prepare to answer the most important question: your next Google search.”


From this we can come to a conclusion that Google the leader of all search engines and it will go on to search for more information and index trillions and zillions of pages in its system for answering to the customer queries.

0 comments: