Apache Authors: John Mertic, Pat Romanski, Liz McMillan, Elizabeth White, Janakiram MSV

Related Topics: @BigDataExpo, Microservices Expo, Containers Expo Blog, Apache, SDN Journal

@BigDataExpo: Blog Feed Post

Big Data Top Ten | @CloudExpo [#BigData]

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig?

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig?

No, you get a “Logical Data Warehouse”.

My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.”

In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one.

As this is happening, I predict that the following 10 Big Data events will occur in 2014.

Screen Shot 2013-12-20 at 7.52.56 AM

1. Consolidation of NoSQLs begins

A few projects have strong commercialization companies backing them. These are companies who have reached “critical mass”, including Datastax with Cassandra, 10gen with MongoDB, and Couchbase with CouchDB.  Leading open source projects, like these, will pull further and further away from the pack of 150+ other NoSQLs, who are either fighting for the same value propositions (with a lot less traction) or solving small niche use-cases (and markets).

2. The Hadoop Clone wars end

The industry will begin standardizing on two distributions. Everyone else will become less relevant (It’s Intel vs. AMD. Lets not forget the other x86 vendors like IBM, UMC, NEC, NexGen, National, Cyrix, IDT, Rise, and Transmeta). If you are a Hadoop vendor, you’re either the Intel or AMD. Otherwise, you better be acquired or get out of the business by end of 2014.

3. Open source business model is acknowledged by Wall Street

Because the open source, scale-out, commodity approach to Big Data is fundamental to the new breed of Big Data technologies, open source now becomes a clear antithesis of the proprietary, scale-up, our-hardware-only, take-it-or-leave-it solutions. Unfortunately, the promises of international expansion, improved traction from sales force expansion, new products and alliances, will all fall on deaf ears of Wall Street analysts. Time to short the platform RDBMS and Enterprise Data Warehouse stocks.

4. Big Data and Cloud really means private cloud

Many claimed that 2013 was the “year of Big Data in the Cloud”. However, what really happened is that the Global 2000 immediately began their bare metal projects under tight control. Now that those projects are underway, 2014 will exhibit the next phase of Big Data on virtualized platforms. Open source projects like Serengeti for VSphere; Savanna for OpenStack; Ironfan for AWS, OpenStack, and VMware combined, or venture-backed and proprietary solutions like Bluedata will enable virtualized Big Data private clouds.

5. 2014 starts the era of analytic applications

Enterprises become savvy to the new reference architecture of combined legacy and new generation IT data infrastructure. Now it’s time to develop a new generation of applications that take advantage of both to solve business problems. System Integrators will shift resources, hire data scientists, and guide enterprises in their development of data-driven applications. This, of course, realizes the concepts like the 360 degree view, Internet of things, and marketing to one.

6. Search-based business intelligence tools will become the norm with Big Data

Having a “Google-like” interface that allows users to explore structured and unstructured data with little formal training is the where the new generation is going. Just look at Splunk for searching machine data. Imagine a marketer being able to simply “Google Search” for insights on their customers?

7. Real-time in-memory analytics, complex event processing, and ETL combine

The days of ETL in its pure form are numbered. It’s either ‘E’, then ‘L’, then ‘T’ with Hadoop, or it’s EAL (extract, apply analytics, and load) with new real-time stream-processing frameworks. Now that high-speed social data streams are the norm, so are processing frameworks that combine streaming data with micro-batch and batch data, performing complex processors on that data and feeding applications in sub-second response times.

8. Prescriptive analytics become more mainstream

After descriptive and predictive, comes prescriptive. Prescriptive analytics automatically synthesizes big data, multiple disciplines of mathematical sciences and computational sciences, and business rules, to make predictions and then suggests decision options to take advantage of the predictions. We will begin seeing powerful use-cases of this in 2014. Business users want to be recommended specific courses of action and to be shown the likely outcome of each decision.

9. MDM will provide the dimensions for big data facts

With Big Data, master data management will now cover both internal data that the organization has been managing over years (like customer, product and supplier data) as well as Big Data that is flowing into the organization from external sources (like social media, third party data, web-log data) and from internal data sources (such as unstructured content in documents and email). MDM will support polyglot persistence.

10. Security in Big Data won’t be a big issue

Peter Sondergaard, Gartner’s senior vice president of research, will say that when it comes to big data and security that “You should anticipate events and headlines that continuously raise public awareness and create fear.” I’m not dismissing the fact that with MORE data comes  more responsibilities, and perhaps liabilities, for those that harbor the data. However, in terms of the infrastructure security itself, I believe 2014 will end with a clear understanding of how to apply those familiar best-practicies to your new Big Data platform including trusted Kerberos, LDAP integration, Active Directory integration, encryption, and overall policy administration.

Read the original blog entry...

More Stories By Jim Kaskade

Jim Kaskade is Vice President and General Manager, Big Data & Analytics, at CSC. Prior to that he was CEO of Infochimps. Before that he served as SVP and General Manager at SIOS Technology, a publicly traded firm in Japan, where he led a business unit focused on developing private cloud Platform as a Service targeted for Fortune 500 enterprises. He has been heavily involved in all aspects of cloud, meeting with prominent CIOs, CISOs, datacenter architects of Fortune 100 companies to better understand their cloud computing needs. He also has hands-on cloud domain knowledge from his experience as founder and CEO of a SaaS company, which secured the digital media assets of over 10,000 businesses including Fortune 100 customers such as Lucasfilm, the NBA, Sony BMG, News Corp, Viacom, and IAC. Kaskade is also one of the Top 100 bloggers on Cloud Computing selected by the Cloud Computing Journal.

@ThingsExpo Stories
Intelligent machines are here. Robots, self-driving cars, drones, bots and many IoT devices are becoming smarter with Machine Learning. In her session at @ThingsExpo, Sudha Jamthe, CEO of IoTDisruptions.com, will discuss the next wave of business disruption at the junction of IoT and AI, impacting many industries and set to change our lives, work and world as we know it.
SYS-CON Events announced today that Cloudbric, a leading website security provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Cloudbric is an elite full service website protection solution specifically designed for IT novices, entrepreneurs, and small and medium businesses. First launched in 2015, Cloudbric is based on the enterprise level Web Application Firewall by Penta Security Sys...
Smart Cities are here to stay, but for their promise to be delivered, the data they produce must not be put in new siloes. In his session at @ThingsExpo, Mathias Herberts, Co-founder and CTO of Cityzen Data, will deep dive into best practices that will ensure a successful smart city journey.
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
SYS-CON Events announced today that MathFreeOn will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MathFreeOn is Software as a Service (SaaS) used in Engineering and Math education. Write scripts and solve math problems online. MathFreeOn provides online courses for beginners or amateurs who have difficulties in writing scripts. In accordance with various mathematical topics, there are more tha...
Successful digital transformation requires new organizational competencies and capabilities. Research tells us that the biggest impediment to successful transformation is human; consequently, the biggest enabler is a properly skilled and empowered workforce. In the digital age, new individual and collective competencies are required. In his session at 19th Cloud Expo, Bob Newhouse, CEO and founder of Agilitiv, will draw together recent research and lessons learned from emerging and established ...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, will discuss how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team a...
Virgil consists of an open-source encryption library, which implements Cryptographic Message Syntax (CMS) and Elliptic Curve Integrated Encryption Scheme (ECIES) (including RSA schema), a Key Management API, and a cloud-based Key Management Service (Virgil Keys). The Virgil Keys Service consists of a public key service and a private key escrow service. 

The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
@ThingsExpo has been named the Top 5 Most Influential Internet of Things Brand by Onalytica in the ‘The Internet of Things Landscape 2015: Top 100 Individuals and Brands.' Onalytica analyzed Twitter conversations around the #IoT debate to uncover the most influential brands and individuals driving the conversation. Onalytica captured data from 56,224 users. The PageRank based methodology they use to extract influencers on a particular topic (tweets mentioning #InternetofThings or #IoT in this ...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smar...
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
With an estimated 50 billion devices connected to the Internet by 2020, several industries will begin to expand their capabilities for retaining end point data at the edge to better utilize the range of data types and sheer volume of M2M data generated by the Internet of Things. In his session at @ThingsExpo, Don DeLoach, CEO and President of Infobright, discussed the infrastructures businesses will need to implement to handle this explosion of data by providing specific use cases for filterin...
What happens when the different parts of a vehicle become smarter than the vehicle itself? As we move toward the era of smart everything, hundreds of entities in a vehicle that communicate with each other, the vehicle and external systems create a need for identity orchestration so that all entities work as a conglomerate. Much like an orchestra without a conductor, without the ability to secure, control, and connect the link between a vehicle’s head unit, devices, and systems and to manage the ...
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the...
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...