Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Related Topics: @BigDataExpo, Microsoft Cloud, @CloudExpo, Apache

@BigDataExpo: Blog Post

Apache Drill’s Self-Service Capabilities By @MapR | @CloudExpo [#BigData]

Big Data is a jungle: rich with resources, abundant in growth, but also a bit overwhelming and easy to get lost in

Help Yourself: Leveraging Apache Drill's Self-Service Capabilities

Small data management solutions don't work in our brave new Big Data world. Back in the small data days, we talked proudly about having gigabytes of structured data that had been carefully denormalized to reduce latency as much as possible. Today's data is measured in petabytes, and it is dynamic, complex, and wildly varied in structure.

Small data was a nicely planned garden, but Big Data is a jungle: rich with resources, abundant in growth, but also a bit overwhelming and easy to get lost in.

Exploring that jungle requires solutions that enable interactive, self-service ways to work with historical as well as near real-time data. Hadoop and NoSQL on Hadoop solved a significant amount of Big Data access and availability problems. Add Apache Drill and SQL-on-Hadoop to the mix and you have a solution designed to enable easy analysis of complex data structures and datasets using the well-known SQL semantics.

If you want to blaze a path through the Big Data jungle, you want Apache Drill in your solution set.

Dig Deep with Drill
Apache Drill is a SQL query engine that works with numerous underlying data formats and sources. As a standalone query engine that supports multiple data sources, it works with the Hadoop and NoSQL database solutions that an organization may already have in place.

Apache Drill excels in demanding situations that require low latency performance, such as data exploration, data discovery, ad hoc business intelligence (BI) queries, and Day Zero analytics. It enables efficient analytics operations ranging from a fast overview of a specific dataset to an extended, explorative analysis of a very large data pool. Apache Drill supports interactive queries, rather than batch-oriented requests. It scales from a single laptop to a large cluster of servers easily.

And it's user-friendly. With minimal IT involvement, Apache Drill enables data to be queried in its native formats, including nested data, schema-less data and dynamic data. There is no need to explicitly define and maintain schemas; Drill can automatically leverage the structure embedded in the data. This enables self-service data exploration. Live data can be worked with upon its arrival with no need to prepare a schema and massage the data into a query-ready form. Analysts can change data sources on the fly without getting hung up waiting for DBA services to structure that newly requested data.

Analysts can also leverage their existing SQL skills and BI tools to directly query self-describing data and process complex data types. This closes the hole that had existed between the standard SQL and Big Data solutions built for efficient use of Big Data, such as Hadoop-based systems, and the need for SQL compatibility to access structured databases.

While we may quietly pride ourselves on the glorious Bigness of our Big Data, we all know that the data in itself is of little value. It's the knowledge that can be gained from it that is priceless. Apache Drill is an essential tool to knock down the wall that had kept businesses from fully harnessing the power of Big Data.

Wait, what about...?
You may be wondering why such a fuss is being made in business and technology circles right now about Apache Drill. After all, there are dozens of other proprietary and open source projects providing SQL or SQL-comparable features on Hadoop.

The problem is that many of these solutions were designed with a "backwardly compatible" mindset. The intent was to take technology designed for small data and engineer it to work in a Big Data world. Useful tools were developed, but it's now time to develop solutions that are designed specifically to support the new ways that we use data.

While Apache Drill was initially inspired by Google's Dremel project, it is now a vehicle that can be used to bring forward-looking technologies to Big Data. Apache Drill is the ideal interactive SQL engine for Hadoop, which rapidly continues to gain popularity, as Apache Drill fully supports Hadoop's (and HBase's) flexibility and agility. Apache Drill is the only SQL engine for Hadoop that doesn't demand schemas to be created and maintained or data to be transformed before it can be queried.

Validated and Approved
The open source community has greatly refined the original features of Google's Dremel, with enhanced capabilities including the extensibility of its architecture, overall agility, support for full SQL, optional schema handling, and its ability to handle nested data (such as JSON, Protobuf, Parquet).

The Apache Software Foundation announced in December 2014 that it has promoted Drill to a top-level project at Apache, where it joins other illustrious projects such as Apache Hadoop and httpd (the world's most popular Web server).

Drill's promotion to a top-level project demonstrates that Drill has a strong community of users and developers. Users can be confident that the project has proven itself and has a viable roadmap for its development. The community will continue to advance Apache Drill's key technologies and performance.

It's time to stop looking to the past for answers and begin driving the future. If you're ready to test-drive Drill, you can do so using the MapR Sandbox for Hadoop, which runs on PC, Mac and Linux platforms. MapR Technologies is the provider of the top-ranked distribution for Apache Hadoop.

You can also view a tutorial on analyzing real-world data using Drill.

More Stories By Nitin Bandugula

As a Sr. Product Marketing Manager at MapR, Nitin brings his engineering, business and management skills together to market technology products. At MapR, Nitin focuses on SQL, batch and in-memory frameworks and streaming technologies on Hadoop. Prior to MapR, Nitin worked for enterprise companies and startups in various roles including Engineering, Product Management and Management Consulting. Nitin holds a Masters degree in Computer Science from the Illinois Institute of Technology and an MBA from the Johnson School at Cornell University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
SYS-CON Events announced today that Avere Systems, a leading provider of hybrid cloud enablement solutions, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere Systems was created by file systems experts determined to reinvent storage by changing the way enterprises thought about and bought storage resources. With decades of experience behind the company’s founders, Avere got its ...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, will discuss how they b...
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-security...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, will discuss how by using...
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lead...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japanese Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ruby Development Inc. builds new services in short period of time and provides a continuous support of those services based on Ruby on Rails. For more information, please visit https://github.com/RubyDevInc.
As businesses evolve, they need technology that is simple to help them succeed today and flexible enough to help them build for tomorrow. Chrome is fit for the workplace of the future — providing a secure, consistent user experience across a range of devices that can be used anywhere. In her session at 21st Cloud Expo, Vidya Nagarajan, a Senior Product Manager at Google, will take a look at various options as to how ChromeOS can be leveraged to interact with people on the devices, and formats th...