|By Bob Gourley||
|November 27, 2012 07:05 AM EST||
On October 24, 2012 Cloudera announced the release of Cloudera Impala and the commercial support subscription service of Cloudera Enterprise Real Time Query (RTQ). During the Hadoop World/STRATA Conference in NYC, I was invited over to see a demonstration. Impala is a SQL based Real Time Query/Ad Hoc query engine built on top of HDFS or Hbase. As I watched the demonstration unfold, I wondered if one of the remaining technology gaps in the NOSQL arsenal had been closed. What gap you ask? Near Real Time Analytics on a NOSQL stack. Working with customers across the Cyber Security customer space, not only do they face the familiar BIGDATA horsemen of the apocalypse: Volume, Velocity and Variety but one more large challenge crept in: Time (V3T). The Near Real Time Analysis/Near Real Time Analytic capability that Cloudera Impala provides is essential in many high value use cases associated with Cyber Security: comparing current activity with observed historical norms, correlation of many disparate data sources/enrichment and automated threat detection algorithms.
When the demonstration concluded, the Cloudera representatives and I discussed the potential of performing an informal independent evaluation of Cloudera Impala against some of the common Real Time/Near Real Time use cases in Cyber Security. I agreed to step up and perform an independent evaluation as well as developing a demonstration platform for FedCyber 2012 (almost three weeks hence for inquiring minds). So let us set the field: a new BETA technology, NO prior exposure to the technology or documentation, a vendor making promises, addressing a large technology gap and three weeks to implement, seemed straight forward; no pressure.
The day after I returned from the STRATA Conference, I returned to my office and provisioned four Virtual Machines in order to build the Impala demonstration. As a committer/contributor for SherpaSurfing an open source Cyber Security solution, I have an abundance of data sets, enrichment sources, Hive data structures and services. Given the amount of time and the audience for FedCyber 2012, I decided to focus on some Intrusion Detection and Netflow related use cases for the demonstration. The data sets for the demonstration included base data sets: 20 million Netflow events, 8 million Intrusion Detection System events and enrichment: Geographic, Blacklist, Whitelist and Protocol related information. Each of the selected uses cases for this demonstration is critical to the Perform Near-Real Time Network Analysis domain in Cyber Security. The name for the demonstration system was decided to be the Impala Mission Demonstration Platform (IMDP). The IMDP was implemented based on vendor recommendations with no tuning or optimization.
The IMDP effort provided me with my first opportunity to work with Cloudera Manager. Although this post is focused on Cloudera Impala I would be remiss not to mention Cloudera Manager. I have worked with Hadoop since 1.0 and built more than a few clusters over the years. I used the installation and configuration guides provided with Cloudera Impala and followed the recommendations. One of the first recommendations was use of the Cloudera Manager. Using the Cloudera Manager (CDH 4.1), I was able to roll out a four node cluster in two hours. I was able to discover the hosts, manage services and provision them in accordance with the IMDP deployment plan. The deployment plan consisted of:
- node 1 – hbase, hdfs, impala, mapreduce
- node2 – hbase, hdfs, impala, mapreduce
- node3 – hbase(region server, master), hdfs(namenode), impala(impalad, statestore), mapreduce(job tracker, tasktracker) , hue, oozie and zookeeper
- node4 – Application Tier, Cloudera Manager
The Cloudera Manager saved at least two days of effort in deploying the cluster, the tight integration with the support portal, comprehensive help and one place to work with all properties of the entire cluster and view space consumption metrics; verdict on Cloudera Manager: Cloudera masterful, bold stroke, thumbs up.
Now that the cluster build-out completed; I shifted attention to deploying and configuring the Cloudera Impala service. Using Cloudera Manager, I deployed Impala on three nodes: three instances of Impalad and one impala state store, in a matter of minutes. I completed the deployment and configuration of the Hive MetaStore. Keeping in mind this is a BETA; the documentation was complete, but fragmented on deployment and configuration (HIVE MetaStore portion); verdict on impala deployment and configuration: solid for a BETA (needs an example hive-site.xml, configuration guide needs better flow).
At this point all configuration and deployment was completed, attention turned to building data structures and loading data. I took the Data Definition Language (DDL) scripts or data structures for ten data sources and enrichment; ported them over to Hive and tested them in less than four hours. It is worthy of mention that the data sources for this demonstration are large flat tables: netflow and intrusion detection system. Cloudera Impala uses HIVE as an Extract Transform Load (ETL) engine, using Hive I defined all of the data structures in source files which were sourced using hive shell: created a database (Sherpa). Hive was then used to load data into the tables that were just created. Creating data structures in Hive was simple as usual and loading data sets was quick (20 million netflow events in 57 seconds). Logging into impala-shell, issued a refresh of the MetaStore and I was working with data. I performed verification of the data load, all data loaded and no issues were revealed. One area of potential improvement would be more comprehensive messages on load failure. Defining the data structures and loading data using Hive was nothing new; verdict: really good; easy to use, easy to load, but need to improve failed load messages.
Finally, we moved on to the most interesting stage which is using Cloudera Impala in a series of Real Time Query (RTQ) scenarios that are common across the Cyber Security customer space. The real world scenarios selected come from the perform netflow analysis set of use case(s). In each of these scenarios, the exact same queries were executed on the same cluster using Hive and then Impala against the same data structures (database and tables). In the Hive approach, we traverse the batch processing stack and with Impala we traverse the Real Time Query (RTQ) stack performing a series of analytics. In the first use case, I ran a five tuple (sip, sport, dip, dport, protocol) summary covering bytes per packet, summing bytes and packets for a 20 million event set resulted in: identical result sets, Hive 82 seconds – Impala 6 seconds. In the second use case, I performed a summary of destination ports where the source port is 80 which resulted in: identical result sets, Hive 57 seconds, Impala 5 seconds. In the third use case, I performed correlation between netflow and intrusion detection systems, correlating netflow with intrusion detection events for several hours which resulted in: identical result sets, Hive 40 seconds, Impala sub-second. Finally, for FedCyber 2012, I developed a java based situational awareness dashboard which connected to Cloudera Impala via ODBC and executed analytics performing: correlation of blacklists, Intrusion Detection, Netflow, statistical cubes for ten hours with a refresh of every five seconds without failure or issue. The ODBC implementation easily provided the ability to export data to desktop tools (using ODBC) and common BI tools as advertised. Developing and Using Cloudera Impala verdict: This is as advertised; easy to use, easy to implement on, very fast, very flexible and more than capable of running real time analytics. The Impala shell is limited but much of the demonstration work was done using result sets so it was not an impediment.
In summation, I have worked for over a decade across the vast BIGDATA technology space covering Legacy Relational Database, Data Warehouse, and NOSQL; Cloudera Impala proved more than capable of running near real time analytics and providing mission relevance to customers with a Near Real Time (NRT) requirement. Based on my initial review Cloudera Impala appears to be a bold step in closing the gap of near real time analytics on a NOSQL stack. I did encounter some minor problems, but the few problems and limitations that were encountered in this demonstration were documented and published in the known issues document so they will not be shared; none were show stoppers.
The notes, details and all of the lessons learned, data structures and the configuration guide from the demonstration are being published out on Github under SherpaSurfing in the coming days. These documents cover everything in detail and will enable developers to replicate the demonstration platform and get a jump start on Cloudera Impala. Finally, I would like to thank two contributors: Hanh Le, Robert Webb and Six3 Systems for helping me pull this off.
SYS-CON Events announced today that StorPool Storage will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. StorPool is distributed storage software that allows service providers, enterprises and other cloud builders to run data storage on standard x86 servers, instead of using expensive and inefficient storage arrays (SAN).
Apr. 26, 2015 04:00 AM EDT Reads: 2,249
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
Apr. 26, 2015 03:30 AM EDT Reads: 1,923
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
Apr. 26, 2015 02:45 AM EDT Reads: 2,004
SYS-CON Events announced today that Site24x7, the cloud infrastructure monitoring service, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Site24x7 is a cloud infrastructure monitoring service that helps monitor the uptime and performance of websites, online applications, servers, mobile websites and custom APIs. The monitoring is done from 50+ locations across the world and from various wireless carriers, thus providing a global perspective of the end-user experience. Site24x7 supports monitoring H...
Apr. 26, 2015 02:30 AM EDT Reads: 2,091
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-critical systems. ISS has completed many successful projects in Healthcare, Commercial, Manufacturing, ...
Apr. 26, 2015 02:15 AM EDT Reads: 2,819
“With easy-to-use SDKs for Atmel’s platforms, IoT developers can now reap the benefits of realtime communication, and bypass the security pitfalls and configuration complexities that put IoT deployments at risk,” said Todd Greene, founder & CEO of PubNub. PubNub will team with Atmel at CES 2015 to launch full SDK support for Atmel’s MCU, MPU, and Wireless SoC platforms. Atmel developers now have access to PubNub’s secure Publish/Subscribe messaging with guaranteed ¼ second latencies across PubNub’s 14 global points-of-presence. PubNub delivers secure communication through firewalls, proxy ser...
Apr. 26, 2015 02:00 AM EDT Reads: 3,883
SYS-CON Events announced today that B2Cloud, a provider of enterprise resource planning software, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. B2cloud develops the software you need. They have the ideal tools to help you work with your clients. B2Cloud’s main solutions include AGIS – ERP, CLOHC, AGIS – Invoice, and IZUM
Apr. 26, 2015 02:00 AM EDT Reads: 3,587
SYS-CON Events announced today that Tufin, the market-leading provider of Security Policy Orchestration Solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. As the market leader of Security Policy Orchestration, Tufin automates and accelerates network configuration changes while maintaining security and compliance. Tufin's award-winning Orchestration Suite™ gives IT organizations the power and agility to enforce security policy across complex, multi-vendor enterprise networks. With more than 1...
Apr. 26, 2015 01:45 AM EDT Reads: 3,625
VoxImplant has announced full WebRTC support in the newest versions of its Android SDK and iOS SDK. The updated SDKs, which enable audio and video calls on mobile devices, are now compatible with the WebRTC standard to allow any mobile app to communicate with WebRTC-enabled browsers, including Google Chrome, Mozilla Firefox, Opera, and, when available, Microsoft Spartan. The WebRTC-updated SDKs represent VoxImplant's continued leadership in simplifying the development of real-time communications (RTC) services for app developers. VoxImplant (built by Zingaya, the real-time communication servi...
Apr. 26, 2015 01:45 AM EDT Reads: 2,214
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cloudian, Inc., is a Foster City, California - based software company specializing in cloud storage software. The main product is Cloudian, an Amazon S3-compliant cloud object storage platform, the bedrock of cloud computing systems, that enables cloud service providers and enterprises to build reliable, affordable and scalable cloud storage solu...
Apr. 26, 2015 01:00 AM EDT Reads: 2,756
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Apr. 26, 2015 12:45 AM EDT Reads: 4,665
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
Apr. 26, 2015 12:00 AM EDT Reads: 5,277
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Raspberry Pi, BeagleBone, Spark and Intel Edison. You will also get an overview of cloud technologies s...
Apr. 25, 2015 07:15 PM EDT Reads: 3,039
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Apr. 25, 2015 04:00 PM EDT Reads: 4,492
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provides a high level technical overview of many cloud services available to mobile app developers, includi...
Apr. 25, 2015 04:00 PM EDT Reads: 1,429
SYS-CON Events announced today that Ciqada will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Ciqada™ makes it easy to connect your products to the Internet. By integrating key components - hardware, servers, dashboards, and mobile apps - into an easy-to-use, configurable system, your products can quickly and securely join the internet of things. With remote monitoring, control, and alert messaging capability, you will meet your customers' needs of tomorrow - today! Ciqada. Let your products take flight. For more inform...
Apr. 25, 2015 03:00 PM EDT Reads: 1,934
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
Apr. 25, 2015 03:00 PM EDT Reads: 2,892
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable older adults to live independent lives while staying connected to loved ones. M2M will continue to gr...
Apr. 25, 2015 03:00 PM EDT Reads: 1,535
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) application providers dating back to 2004. Cybozu's kintone.com is a leading global BYOA (Build Your O...
Apr. 25, 2015 02:00 PM EDT Reads: 1,579
While not quite mainstream yet, WebRTC is starting to gain ground with Carriers, Enterprises and Independent Software Vendors (ISV’s) alike. WebRTC makes it easy for developers to add audio and video communications into their applications by using Web browsers as their platform. But like any market, every customer engagement has unique requirements, as well as constraints. And of course, one size does not fit all. In her session at WebRTC Summit, Dr. Natasha Tamaskar, Vice President, Head of Cloud and Mobile Strategy at GENBAND, will explore what is needed to take a real time communications ...
Apr. 25, 2015 02:00 PM EDT Reads: 1,801