Hadoop Beyond MapReduce : Introducing Kitten

CrackSmokeRepublican · June 29, 2012, 02:34:15 AM

Keep an eye open... --CSR

QuoteHadoop Beyond MapReduce, Part 1: Introducing Kitten

by Josh Wills
June 26, 2012

This week, a team of researchers at Google will be presenting a paper describing a system they developed that can learn to identify objects, including the faces of humans and cats, from an extremely large corpus of unlabeled training data. It is a remarkable accomplishment, both in terms of the system's performance (a 70% improvement over the prior state-of-the-art) and its scale: the system runs on over 16,000 CPU cores and was trained on 10 million 200×200 pixel images extracted from YouTube videos.

Doug Cutting has described Apache Hadoop as "the kernel of a distributed operating system." Until recently, Hadoop has been an operating system that was optimized for running a certain class of applications: the ones that could be structured as a short sequence of MapReduce jobs. Although MapReduce is the workhorse programming framework for distributed data processing, there are many difficult and interesting problems– including combinatorial optimization problems, large-scale graph computations, and machine learning models that identify pictures of cats– that can benefit from a more flexible execution environment.

The Info Underground

Hadoop Beyond MapReduce : Introducing Kitten

CrackSmokeRepublican