Click here to close now.

Welcome!

Apache Authors: Liz McMillan, Dana Gardner, Pat Romanski, Carmen Gonzalez, Plutora Blog

Related Topics: Cloud Expo, Java, SOA & WOA, Open Source, Web 2.0, Apache

Cloud Expo: Article

Big Data Security for Apache Hadoop

Ten Tips for HadoopWorld attendees

Big Data takes center stage today at the Strata Conference & Hadoop World in New York, the world’s largest gathering of the Apache Hadoop™ community. A key conversation topic will be how organizations can improve data security for Hadoop and the applications that run on the platform. As you know, Hadoop and similar data stores hold a lot of promise for organizations to finally gain some value out of the immense amount of data they're capturing. But HDFS, Hive and other nascent NoSQL technologies were not necessarily designed with comprehensive security in mind. Often what happens as big data projects grow is sensitive data like HIPAA data, PII and financial records get captured and stored. It's important this data remains secure at rest.

I polled my fellow co-workers at Gazzang last week, and asked them to come up with a top ten list for securing Apache Hadoop. Here's what they delivered. Enjoy:

Think about security before getting started – You don’t wait until after a burglary to put locks on your doors, and you should not wait until after a breach to secure your data. Make sure a serious data security discussion takes place before installing and feeding data into your Hadoop cluster.

Consider what data may get stored – If you are using Hadoop to store and run analytics against regulatory data, you likely need to comply with specific security requirements. If the stored data does not fall under regulatory jurisdiction, keep in mind the risks to your public reputation and potential loss of revenue if data such as personally identifiable information (PII) were breached.

Encrypt data at rest and in motion – Add transparent data encryption at the file layer as a first step toward enhancing the security of a big data project. SSL encryption can protect big data as it moves between nodes and applications.

As Securosis analyst Adrian Lane wrote in a recent blog, “File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats.”

Store the keys away from the encrypted data – Storing encryption keys on the same server as the encrypted data is akin to locking your house and leaving the key in your front door. Instead, use a key management system that separates the key from the encrypted data.

Institute access controls – Establishing and enforcing policies that govern which people and processes can access data stored within Hadoop is essential for keeping rogue users and applications off your cluster.

Require multi-factor authentication - Multi-factor authentication can significantly reduce the likelihood of an account being compromised or access to Hadoop data being granted to an unauthorized party.

Use secure automation – Beyond data encryption, organizations should look to DevOps tools such as Chef or Puppet for automated patch and configuration management.

Frequently audit your environment – Project needs, data sets, cloud requirements and security risks are constantly changing. It’s important to make sure you are closely monitoring your Hadoop environment and performing frequent checks to ensure performance and security goals are being met.

Ask tough questions of your cloud provider – Be sure you know what your cloud provider is responsible for. Will they encrypt your data? Who will store and have access to your keys? How is your data retired when you no longer need it? How do they prevent data leakage?

Centralize accountability – Centralizing the accountability for data security ensures consistent policy enforcement and access control across diverse organizational silos and data sets.

Did we miss anything? If so, please comment below, and enjoy Strata +HadoopWorld.

More Stories By David Tishgart

After spending years at large corporations including Dell, AMD and BMC, David Tishgart joined the startup ranks leading product marketing for Gazzang. Focused on security for big data, he helps communicate the benefits and challenges that big data can present, offering practical solutions. When not ranting about encryption and key management, you can find David clamoring for a big data application that can fine tune his fantasy football team.

@ThingsExpo Stories
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along with a steady stream of well-publicized data breaches, only add to the uncertainty
As organizations shift toward IT-as-a-service models, the need for managing and protecting data residing across physical, virtual, and now cloud environments grows with it. CommVault can ensure protection &E-Discovery of your data – whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise. In his session at 16th Cloud Expo, Randy De Meno, Chief Technologist - Windows Products and Microsoft Partnerships, will discuss how to cut costs, scale easily, and unleash insight with CommVault Simpana software, the only si...
The explosion of connected devices / sensors is creating an ever-expanding set of new and valuable data. In parallel the emerging capability of Big Data technologies to store, access, analyze, and react to this data is producing changes in business models under the umbrella of the Internet of Things (IoT). In particular within the Insurance industry, IoT appears positioned to enable deep changes by altering relationships between insurers, distributors, and the insured. In his session at @ThingsExpo, Michael Sick, a Senior Manager and Big Data Architect within Ernst and Young's Financial Servi...
PubNub on Monday has announced that it is partnering with IBM to bring its sophisticated real-time data streaming and messaging capabilities to Bluemix, IBM’s cloud development platform. “Today’s app and connected devices require an always-on connection, but building a secure, scalable solution from the ground up is time consuming, resource intensive, and error-prone,” said Todd Greene, CEO of PubNub. “PubNub enables web, mobile and IoT developers building apps on IBM Bluemix to quickly add scalable realtime functionality with minimal effort and cost.”
The Workspace-as-a-Service (WaaS) market will grow to $6.4B by 2018. In his session at 16th Cloud Expo, Seth Bostock, CEO of IndependenceIT, will begin by walking the audience through the evolution of Workspace as-a-Service, where it is now vs. where it going. To look beyond the desktop we must understand exactly what WaaS is, who the users are, and where it is going in the future. IT departments, ISVs and service providers must look to workflow and automation capabilities to adapt to growing demand and the rapidly changing workspace model.
Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
With several hundred implementations of IoT-enabled solutions in the past 12 months alone, this session will focus on experience over the art of the possible. Many can only imagine the most advanced telematics platform ever deployed, supporting millions of customers, producing tens of thousands events or GBs per trip, and hundreds of TBs per month. With the ability to support a billion sensor events per second, over 30PB of warm data for analytics, and hundreds of PBs for an data analytics archive, in his session at @ThingsExpo, Jim Kaskade, Vice President and General Manager, Big Data & Ana...
In the consumer IoT, everything is new, and the IT world of bits and bytes holds sway. But industrial and commercial realms encompass operational technology (OT) that has been around for 25 or 50 years. This grittier, pre-IP, more hands-on world has much to gain from Industrial IoT (IIoT) applications and principles. But adding sensors and wireless connectivity won’t work in environments that demand unwavering reliability and performance. In his session at @ThingsExpo, Ron Sege, CEO of Echelon, will discuss how as enterprise IT embraces other IoT-related technology trends, enterprises with i...
When it comes to the Internet of Things, hooking up will get you only so far. If you want customers to commit, you need to go beyond simply connecting products. You need to use the devices themselves to transform how you engage with every customer and how you manage the entire product lifecycle. In his session at @ThingsExpo, Sean Lorenz, Technical Product Manager for Xively at LogMeIn, will show how “product relationship management” can help you leverage your connected devices and the data they generate about customer usage and product performance to deliver extremely compelling and reliabl...
The Internet of Things (IoT) is causing data centers to become radically decentralized and atomized within a new paradigm known as “fog computing.” To support IoT applications, such as connected cars and smart grids, data centers' core functions will be decentralized out to the network's edges and endpoints (aka “fogs”). As this trend takes hold, Big Data analytics platforms will focus on high-volume log analysis (aka “logs”) and rely heavily on cognitive-computing algorithms (aka “cogs”) to make sense of it all.
Hadoop as a Service (as offered by handful of niche vendors now) is a cloud computing solution that makes medium and large-scale data processing accessible, easy, fast and inexpensive. In his session at Big Data Expo, Kumar Ramamurthy, Vice President and Chief Technologist, EIM & Big Data, at Virtusa, will discuss how this is achieved by eliminating the operational challenges of running Hadoop, so one can focus on business growth. The fragmented Hadoop distribution world and various PaaS solutions that provide a Hadoop flavor either make choices for customers very flexible in the name of opti...
HP and Aruba Networks on Monday announced a definitive agreement for HP to acquire Aruba, a provider of next-generation network access solutions for the mobile enterprise, for $24.67 per share in cash. The equity value of the transaction is approximately $3.0 billion, and net of cash and debt approximately $2.7 billion. Both companies' boards of directors have approved the deal. "Enterprises are facing a mobile-first world and are looking for solutions that help them transition legacy investments to the new style of IT," said Meg Whitman, Chairman, President and Chief Executive Officer of HP...
One of the biggest impacts of the Internet of Things is and will continue to be on data; specifically data volume, management and usage. Companies are scrambling to adapt to this new and unpredictable data reality with legacy infrastructure that cannot handle the speed and volume of data. In his session at @ThingsExpo, Don DeLoach, CEO and president of Infobright, will discuss how companies need to rethink their data infrastructure to participate in the IoT, including: Data storage: Understanding the kinds of data: structured, unstructured, big/small? Analytics: What kinds and how responsiv...
Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use cases.
Sensor-enabled things are becoming more commonplace, precursors to a larger and more complex framework that most consider the ultimate promise of the IoT: things connecting, interacting, sharing, storing, and over time perhaps learning and predicting based on habits, behaviors, location, preferences, purchases and more. In his session at @ThingsExpo, Tom Wesselman, Director of Communications Ecosystem Architecture at Plantronics, will examine the still nascent IoT as it is coalescing, including what it is today, what it might ultimately be, the role of wearable tech, and technology gaps stil...
The true value of the Internet of Things (IoT) lies not just in the data, but through the services that protect the data, perform the analysis and present findings in a usable way. With many IoT elements rooted in traditional IT components, Big Data and IoT isn’t just a play for enterprise. In fact, the IoT presents SMBs with the prospect of launching entirely new activities and exploring innovative areas. CompTIA research identifies several areas where IoT is expected to have the greatest impact.
Wearable devices have come of age. The primary applications of wearables so far have been "the Quantified Self" or the tracking of one's fitness and health status. We propose the evolution of wearables into social and emotional communication devices. Our BE(tm) sensor uses light to visualize the skin conductance response. Our sensors are very inexpensive and can be massively distributed to audiences or groups of any size, in order to gauge reactions to performances, video, or any kind of presentation. In her session at @ThingsExpo, Jocelyn Scheirer, CEO & Founder of Bionolux, will discuss ho...
Cloud data governance was previously an avoided function when cloud deployments were relatively small. With the rapid adoption in public cloud – both rogue and sanctioned, it’s not uncommon to find regulated data dumped into public cloud and unprotected. This is why enterprises and cloud providers alike need to embrace a cloud data governance function and map policies, processes and technology controls accordingly. In her session at 15th Cloud Expo, Evelyn de Souza, Data Privacy and Compliance Strategy Leader at Cisco Systems, will focus on how to set up a cloud data governance program and s...
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
Roberto Medrano, Executive Vice President at SOA Software, had reached 30,000 page views on his home page - http://RobertoMedrano.SYS-CON.com/ - on the SYS-CON family of online magazines, which includes Cloud Computing Journal, Internet of Things Journal, Big Data Journal, and SOA World Magazine. He is a recognized executive in the information technology fields of SOA, internet security, governance, and compliance. He has extensive experience with both start-ups and large companies, having been involved at the beginning of four IT industries: EDA, Open Systems, Computer Security and now SOA.