Big data” is clearly the buzzword of the year and will soon knock “cloud computing” from its leading position, at least according to Google Trends. Big data has appeared in the news, the US government recently announced a Big Data Research and Development initiative (see http://tinyurl.com/85oytkj), and hundreds of startups are working on big data solutions while established providers try to catch up with the new trend. The Economist even called out that we live in the age of big data (www.economist.com/node/15557487). But how did it come to this? First, mobile sensors, social media services, genomic sequencing, and astronomy are among myriad applications that have generated an explosion of abundant data. For example, smartphones’ success has created the biggest sensor deployment in the world, with millions of users constantly taking pictures and reporting their movements and activities. Social networks from Twitter to Facebook let users continuously generate and share content with friends, producing even more data about what we do and how we interact with each other. Second, storage capacity for the past 30 years has roughly doubled every 14 months, making storing data cheaper than ever. Storage is now so cheap that it’s often better to buy more than try to determine what to delete. Third, and most importantly, recent advances in machine learning and information retrieval let us convert previously useless data into knowledge. For example, Netflix’s very profitable streaming service offers a large but rather lowquality and slightly outdated movie and TV show collection. However, its movie recommendation system helps users find the few movies in this huge collection that they might actually enjoy watching. Key to this system is a rather new machine-learning technique called alternating least squares, a form of collaborative filtering that lets Netflix compare users with each other at scale to make individual recommendations. Success stories like this have created a compulsion in companies to record and collect everything possible. For example, Facebook is collecting more than 500 Tbytes of data every day, of which more than 130 alone are just log records. Most of this might not be useful today, but could be tomorrow. Companies now regard data as one of their biggest assets, and the urge to collect more seems to have no boundaries. This new urge imposes huge challenges on hardware infrastructure and software; thus it isn’t surprising that various new systems have appeared on the market to solve these challenges, with Hadoop being the most prominent one. At the same time, famous researchers argue that there’s nothing new about these systems, especially Hadoop, and that many of them present a huge step backward.1 In the rest of this article, I’ll try to shed some light on this discussion.