Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Related Topics: @BigDataExpo, @CloudExpo, Apache

@BigDataExpo: Article

The 'Big' Fallacy of Big Data | @BigDataExpo #BigData

Why companies are luring you into the Big Data Trap

Unless you've been living under a rock for the past couple of years, you've been hearing about the world of Big Data nonstop. Big Data promises fortune and power to those that can wield the somewhat mystical and often nebulous power of "Big Data". Unfortunately for the rest of us mere mortals Big Data is built on an out-right lie that is both pernicious and unfortunate. It's hiding right there in plain sight in the name itself. The word, BIG.

The Fallacy of Big Data is that you have to have a lot of data for it to be relevant. The common catch phrase is: "More data = more insights". There is a nugget of truth to this in that, in some cases, a lot of data is needed in order to establish valid patterns and create real insight into the activity the data represents. More often than not however, this creates a significant challenge to those responsible for performing analytics which is sifting through a mountain of data to find the parts that actually matter. Recent studies have shown that fully 80% of data analysis is spent just tinkering with the data to get it into a usable format. So we see that more data creates a massive data curation issue, and leaves us with more work to do to even start experimenting, much less monetizing our data.

The reality of "Big Data" is that it was invented by those with no skin in the game. Analytics, open source, digital transformation, and Cloud are all of the technologies that enable comprehensive data analysis. With minimal infrastructure, commodity hardware, and free or nearly free software to store, analyze, and more importantly drive value from that data, the big infrastructure players are left out in the cold with nothing to offer. Enter "Big Data", because if you are going to try and manage petabytes of data you need good storage, and 10's of thousands of servers is awful to manage. So the Fallacy is born:

"In order to get real results from data, you cannot rely on just a little bit of it, or just the relevant data, you need every set of data imaginable. Therefore, (and here's where things get squidgy) you need to bring all that data in house (because the cloud is too expensive to store it) and you need a lot of manageable and flexible enterprise-grade gear to do it with (because free stuff is not enterprise ready)."

You can see how this is built around some nuggets of truth. I was asked recently, "how would you move a petabyte of data to Amazon cloud storage?" and I answered as truthfully as I could, "Very Slowly". Cloud does get expensive when used for a lot of infrastructure, but when used as a part of the overall solution it is an important tool. Also the thought of managing a massive Hadoop cluster of 1000 "exactly the same" servers sounds like the hell of IT in the pre-VM days, but it is also not really an accurate picture of the Hadoop landscape. The vast majority of analytics clusters top out around 50 servers and that's far more manageable (and less expensive) than huge enterprise gear. To be fair, there are organizations out there where a massive-scale, enterprise platformed approach will make sense, but the unfortunate side effect of this approach by legacy vendors is that they have made the solution itself the barrier to entry.

The problem is that now "Big Data" has made it into the vernacular and worse yet, has become synonymous with Data Analytics. Every company, organization, or even individual on earth can benefit from analyzing their relevant data for new insights. Take a very simple example; look at your budget to identify where you overspend (too many meals out for example). That is personal analytics, it does not require complex anything, and there are numerous ways to do it with free or nearly free tools. Now scale that up to the bank that wants to offer new digital, data-driven products to customers. They already have a lot of that data in house, and they already have a lot of analytical tools. Why would they need, per-se, to include every data set under the sun? They may want some more sets of data (social media to identify trends that might lead to investment opportunity), but they don't HAVE to have it stored in house to use it - it is all offered free-to-use via serialized API's. In the unique case where if they did decide to store it all in house, we are not talking about 10's of PB of data. More like adding a few 10's to 100's of TB for the data in question, because again - you don't download all of Twitter, just the stuff that is relevant to you. Also analytic data is largely transient data, meaning that it is used for the analysis and then discarded (especially true in the real-time world), so where is the need for massive infrastructure to support that initiative?

I have spoken a lot about "Big Data" and the Fallacy and trap of paying too much attention to the word BIG. Data is important to everyone and it can have value for anyone. In my most recent speaking sessions I have shown how you can do a simple social analysis for free in a matter of minutes. You don't need a massive infrastructure to make that production ready either. It just takes some willingness to see through the noise to the actual value of what the "Big Data" message is trying to say. Analytics is important and valuable for everyone. You don't have to be a Fortune 100 company to create value from the data you already have, and to bring in new data for analytics. Everyone can do it.

For more thought provoking content on Big Data and Data Analytics, click here.

Connect with  me on Twitter or LinkedIn and share your thoughts!

More Stories By Christopher Harrold

As an Agent of IT Transformation, I have over 20 years experience in the field. Started off as the IT Ops guy and followed the trends of the DevOps movement wherever I went. I want to shake up accepted ways of thinking and develop new models and designs that push the boundaries of technology and of the accepted status quo. There is no greater reward for me than seeing something that was once dismissed as "impossible" become the new normal, and I have been richly rewarded throughout my career with this result. In my last role as CTO at EMC Corporation, I was working tirelessly with a small group of engineers and product managers to build a market leading, innovative platform for data analytics. Combining best of breed storage, analytics and visualization solutions that enables the Data as a Service model for enterprise and mid sized companies globally.

@ThingsExpo Stories
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assista...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...
With the introduction of IoT and Smart Living in every aspect of our lives, one question has become relevant: What are the security implications? To answer this, first we have to look and explore the security models of the technologies that IoT is founded upon. In his session at @ThingsExpo, Nevi Kaja, a Research Engineer at Ford Motor Company, discussed some of the security challenges of the IoT infrastructure and related how these aspects impact Smart Living. The material was delivered interac...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
When growing capacity and power in the data center, the architectural trade-offs between server scale-up vs. scale-out continue to be debated. Both approaches are valid: scale-out adds multiple, smaller servers running in a distributed computing model, while scale-up adds fewer, more powerful servers that are capable of running larger workloads. It’s worth noting that there are additional, unique advantages that scale-up architectures offer. One big advantage is large memory and compute capacity...
The Internet giants are fully embracing AI. All the services they offer to their customers are aimed at drawing a map of the world with the data they get. The AIs from these companies are used to build disruptive approaches that cannot be used by established enterprises, which are threatened by these disruptions. However, most leaders underestimate the effect this will have on their businesses. In his session at 21st Cloud Expo, Rene Buest, Director Market Research & Technology Evangelism at Ara...
"When we talk about cloud without compromise what we're talking about is that when people think about 'I need the flexibility of the cloud' - it's the ability to create applications and run them in a cloud environment that's far more flexible,” explained Matthew Finnie, CTO of Interoute, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...