Hadoop - Yahoo's generous gift


The Apache Hadoop Core is an open source platform enabling developers to write and run applications across clusters of commodity computers and process vast amounts of data. As a primary investor and developer of Hadoop, today Yahoo released their own tested distribution of the code that powers their search engines, ad systems and webmail services. There is a growing move toward design patterns that leverage the parallelism inherent in distributed systems such as Hadoop and Google's MapReduce. Applications can be developed on single servers then deployed on massively distributed cloud infrastructure without knowing the details of such distribution. Once these applications are deployed they can act as their own service provider to other systems. Indeed we see that scenario with Amazon's EC2 (with native Hadoop support), IBM's Blue Cloud Initiative and Google today. This type of database is not a relational engine like Oracle or SQL Server. Hadoop enables large scale data mining for useful applications such as fraud detection and rich media indexing. I think this release is significant because it allows developers to take advantage of all the work put in to improve Hadoop over the years. Yahoo's change log file has over 8,400 lines and contains a wealth of knowledge gained by real production experience. Can Yahoo gain cloud credibility by giving it away? I think so; it gives everyone a living benchmark.