Tuesday, September 02, 2008

Google's Real Problems

As search engine optimisers, we get the impression that a search engine's biggest nightmare is the work we carry out. Not only is this egotistical but it is also untrue. Most optimisers try to give the sites they work with the best possible rankings for what they have and, in so doing, help to show the sites that are serious about their web presence, and make sites more accessible to the search engines and to people with all abilities and disabilities.

'Black Hat' search engine marketers might cause Google to miss a step occasionally as they find ways to fool the search engines but these efforts are short-lived and the brainpower of Google's 100 Doctors of Philosophy are either a step ahead or ready to react quickly.

In truth, Google's problems are much more mundane than industrial espionage or seditious optimisation practices: the main problem is that of scale.

Scale has been a constraint for all kinds of organisations and empires throughout time - the Romans were hampered by their limited mathematical abilities and could not grow beyond the limits of around 100AD and the British Empire found itself becoming the world's policeman, a title that came at a tremendous cost. Likewise, Google's size gives it tremendous problems. Here are a few that you may not have considered.

Breakdowns: It is estimated that Google runs 500,000 servers. At any one time there are going to be several machines that have gone down. Mechanical problems mount up and even if 1 in 1000 servers fails completely in a given year, that's 500 to be changed. With our experience with computers, things go wrong all the time & Google might expect 1000 servers to be down at any one time.

Power: Google has 3 main datacentres, each of which chew through power at around $5000 per hour. Such massive power usage causes massive headaches. Google has recently invested in solar power and put time & money into RE

Revenues: Furthermore, search is a costly area with an estimated 85% of searches being non-commercial and thus difficult to wrap advertising around. Google only has one tried-and-tested means of making money - advertising. But for that, they do not have a business. With the advent of social networking sites like Facebook, with the rise of other media like Video, more and more people are interacting with the Internet without needing search engines; they can get their information and products by interacting with friends rather than by asking a search engine a limited or clunky question. Google has to try hard to preserve its revenue stream - it has grown fat on search revenues and shareholders will demand that they continue to grow and do not get syphoned off by other Internet companies.

Costs: There are massive costs involved in search. Apart from the employment costs (of Google's 100+ doctorates), the costs of machinery, power and research are limiting.

Data: Google's index has grown to 20 billion pages, a gigantic amount. It takes 6 weeks for them to index the sites on the Internet. This amount of data, however, is dwarfed by the 1 trillion pages that they know about. If your dataset is 2% of the data that is on the Internet then you have to ber VERY good at indexing the right data or else searchers will go elsewhere. Google has to be sure that it provides the results that people will find useful very quickly or else users will go elsewhere - too much information and there will be spam and drivel in the index and too little and searchers will miss out on the Internet's variety.

Furthermore, the Internet is growing massively. With Web 2.0 applications like blogs that allow people to quickly & easily publish their information on the Internet (this blog is just one example), the Internet has grown exponentially. Around 140 million people were online in 1998 when Google Inc started compared to 1.4 billion today according to internetworldstats.com and all of them have access to publishing tools that were not available 2 or 3 years ago. There is the problem that the amound of information online might grow in excess of Moore's Law (the rule of thumb used by the founder of Intel in 1965 which basically stated that the number of transistors on an integrated surface doubles every 18 months). This means that if the amount of information on the Internet doubles in less than 18 months, Google either takes longer to compile and analyse the data or it must invest in more & more computing power.

Even more troubling is the problem of infinite scalability. I do not know the limits of technology but there must be significant software problems in co-ordinating the data on 500,000 servers and there must be a limit to the complexity of such a system. With a rapidly growing dataset with which to compute, Google is at the limits of the possible as far as computing is concerned.

These problems of scale mean that Google spends more and more time and money worrying about physical constraints than about giving the best search results that get the most advertising revenues. Will the company be able to continue to grow beyond these constraints and keep searchers & shareholders happy? Will they be able to continue to try to take market share from companies like Microsoft on browsers that do not earn them any money? Time will tell.




Thanks to Leslie Rhode & Andy Edmonds for their opinions.

No comments: