| By Jnan Dash | Article Rating: |
|
| October 10, 2012 05:16 PM EDT | Reads: |
2,029 |
Hadoop traces its origins to Google where two early projects GFS (Google File System) and GMR (Google Map Reduce) were written besides Big Table, to manage large volumes of data. These systems are great at crunching large volumes of data in a distributed computing environment (with commodity servers) in batch mode. Any changes to the data requires streaming over the entire data-set and thus big latency. So it is good for “Data in Rest” or static data.
Now Google finds itself limited by its own invention of GFS/GMR/BigTable. Hence they have been working on the post-Hadoop set of data crunching tools – Percolator, Dremel, and Pregel. Here is a brief narration of each of these tools.
Percolator is a system for incrementally processing updates to a large data set. By replacing a batch-based indexing system with one on incremental processing with Percolator, you significantly speed up the process and reduce analysis time. Percolator’s architecture provides horizontal scalability and resilience. The best candidates for this is large indexes where the performance improvement factor can be 100. The big advantage of Percolator is that the indexing time is now proportional to the size of the page, not to the size of the index.
Dremel is for ad-hoc analytics. It is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Dremel claims to be about 100 times faster than MapReduce. It’s architecture is similar to Pig and Hive, but instead of MapReduce, it’s engine is based on aggregator trees.
Pregel is a system for large-scale graph processing and graph data analysis. It is designed to execute graph algorithms faster and API is easy to use. As to be expected Pregel is architected for efficient, scalable, and fault-tolerant implementation on clusters of thousands of commodity computers. Graphs are everywhere – social networks, computer network topologies, games among soccer teams, citations among scientific papers, and the most pervasive graph is the web itself. Pregel is a scalable infrastructure to mine a wide range of graphs and programs are expressed as a sequence of iterations. Google has been using Pregel internally for some time now.
Besides Google, Facebook and Twitter are also working on new innovations. Recently Twitter released its Storm project to the Apache open source. One key trend is “Data in Motion”, or how to deal with data that is moving. This is the velocity aspect of Big Data.
Read the original blog entry...
Published October 10, 2012 Reads 2,029
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jnan Dash
Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.
- Cloud People: A Who's Who of Cloud Computing
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo New York | Big Data: What It Means for Legal & Risk Management
- Altova Announces General Availability of RaptorXML
- Reflections on the Future of Platform as a Service (PaaS)
- Big Data Will Revolutionize Learning
- Cloud Expo New York: Getting to the Promise of Big Data
- 2013 - 2016 : solutions stabilisées, usages innovants généralisés
- Cloud Expo New York: Cloud Architecture and Engineering
- Cloud People: A Who's Who of Cloud Computing
- Portable Experimenter’s Platform, Powered by Raspberry Pi
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Cloud Expo New York: Real-Time Analytics Using an In-Memory Data Grid
- Cloud Expo New York: The Big Challenge of Big Data & Hadoop Integration
- NIST to Sponsor FFRDC Widespread Adoption of Integrated CyberSecurity
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Agile Solutions for Cloud, Big Data, Mobility Services
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Computing: Cutting Costs, Boosting Profits
- AMAX Launches StorMax(TM) CFS, powered by IBM(R) General Parallel File System(TM) (GPFS(TM))
- Benefits of Cloud Computing
- The Top 250 Players in the Cloud Computing Ecosystem
- Web Services Using ColdFusion and Apache CXF
- Cloud People: A Who's Who of Cloud Computing
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- Cloud Expo New York Call for Papers Now Open
- Eclipse "Pollinate" Project to Integrate with Apache Beehive
- An Introduction to Ant
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Beehive Code Now Available in Apache
- 4th International Cloud Computing Conference & Expo Starts Today
- Apache's Tomcat 5.5 is First Release Ever to Use Eclipse JDT Java Compiler
- "Beehive" Now Officially an Open Source Project: Apache Beehive























