Apache Authors: John Mertic, Pat Romanski, Liz McMillan, Elizabeth White, Janakiram MSV

Related Topics: @BigDataExpo, Java IoT, Microservices Expo, @CloudExpo, Apache, SDN Journal

@BigDataExpo: Article

Happiness Is… a Handhold on Hadoop

For a Hadoop solution do we look inside or outside?

This post is sponsored by The Business Value Exchange and HP Enterprise Services

As we know, the subject of Big Data and the ‘space race' to produce software application development functions that will enable us to extract insight and (therefore) value from the Big Data mountain remains one of the most discussed issues in information technology today.

Increasingly prevalent and popular, if not quite as ‘predominant' as some would have us believe, in this arena is Apache Hadoop. This software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

But there's a problem, because Hadoop is drastically underutilized in two respects:

  • Full-blown implementations of Hadoop are argued to be extremely technically difficult to pull off.
  • Implementations that do exist are argued to only take advantage of a fraction of what might be represented in a complete deployment in terms of data management and sheer number crunching power.

What's the answer?

Do we look inside (@ logs) or outside (@ architecture)?
For a Hadoop solution do we look inside or outside? That is to say, do we look inside at logs and logfiles as we tinker around to perfect our Hadoop installation? Or do we look at higher level and look at the architectural considerations that should be governing any individual instance of Hadoop to get some greater insight into what should be working?

Looking inside at logs and logfiles - these are files that record "events" occurring throughout an operating system or software application or data management environment such as Apache Hadoop.

If we look at how our logs and logfiles are performing, then we can get information on hidden: errors, anomalies, problems and patterns... and these are the sorts of reports that can help guide DevOps (developer-operations) pros as they attempt to being a Hadoop project online.

The HP System Management Homepage (SMH) software function provides this kind of information to users working directly with the firm's own dedicated software for particular hardware. Elsewhere there are products such as XpoLog Augmented Search 5.0, which brings XpoLog's troubleshooting capabilities to the Hadoop platform. Put simply, it's a big expanding market.

... and then outside (@ architecture)?
The converse approach (actually it should be corollary and complementary one) here is to focus more closely on the outside, i.e., the architecture inside which an instance of Hadoop is created. HP provides its own Reference Architectures for Hadoop and this is available for each of the three leading distributions (Cloudera, Hortonworks and MapR).

This sponsored HP commentary has highlighted the firm's own product initially, but thankfully HP is big and bold enough not to shirk away from us being able to mention other vendors in this space (most of which will be key partners anyway) - so yes indeed competing products do exist from Cisco, Dell, IBM and others.

Ways to Improve the RDBMS with Hadoop
In a comprehensive sub-headed piece entitled Ten Ways To Improve the RDBMS with Hadoop to be found on Business Process Management (BPM) website http://www.ebizq.net/ you can read the following opinion why a good Hadoop installation can help improve the scalability of applications:

"Very low cost commodity hardware can be used to power Hadoop clusters since redundancy and fault resistance is built into the software instead of using expensive enterprise hardware or software alternatives with proprietary solutions. This makes adding more capacity (and therefore scale) easier to achieve and Hadoop is an affordable and very granular way to scale out instead of up. While there can be cost in converting existing applications to Hadoop, for new applications it should be a standard option in the software selection decision tree."

There is much to gain from intelligent implementation of Hadoop, but it's not easy and we need to look both inside and out (and back to front) in terms of where we can get guidance on best practice and efficiency in our implementation.

More Stories By Adrian Bridgwater

Adrian Bridgwater is a freelance journalist and corporate content creation specialist focusing on cross platform software application development as well as all related aspects software engineering, project management and technology as a whole.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@ThingsExpo Stories
The Internet of Things (IoT), in all its myriad manifestations, has great potential. Much of that potential comes from the evolving data management and analytic (DMA) technologies and processes that allow us to gain insight from all of the IoT data that can be generated and gathered. This potential may never be met as those data sets are tied to specific industry verticals and single markets, with no clear way to use IoT data and sensor analytics to fulfill the hype being given the IoT today.
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the...
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
If you had a chance to enter on the ground level of the largest e-commerce market in the world – would you? China is the world’s most populated country with the second largest economy and the world’s fastest growing market. It is estimated that by 2018 the Chinese market will be reaching over $30 billion in gaming revenue alone. Admittedly for a foreign company, doing business in China can be challenging. Often changing laws, administrative regulations and the often inscrutable Chinese Interne...
Enterprise IT has been in the era of Hybrid Cloud for some time now. But it seems most conversations about Hybrid are focused on integrating AWS, Microsoft Azure, or Google ECM into existing on-premises systems. Where is all the Private Cloud? What do technology providers need to do to make their offerings more compelling? How should enterprise IT executives and buyers define their focus, needs, and roadmap, and communicate that clearly to the providers?
SYS-CON Events announced today that Streamlyzer will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Streamlyzer is a powerful analytics for video streaming service that enables video streaming providers to monitor and analyze QoE (Quality-of-Experience) from end-user devices in real time.
You have great SaaS business app ideas. You want to turn your idea quickly into a functional and engaging proof of concept. You need to be able to modify it to meet customers' needs, and you need to deliver a complete and secure SaaS application. How could you achieve all the above and yet avoid unforeseen IT requirements that add unnecessary cost and complexity? You also want your app to be responsive in any device at any time. In his session at 19th Cloud Expo, Mark Allen, General Manager of...
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
One of biggest questions about Big Data is “How do we harness all that information for business use quickly and effectively?” Geographic Information Systems (GIS) or spatial technology is about more than making maps, but adding critical context and meaning to data of all types, coming from all different channels – even sensors. In his session at @ThingsExpo, William (Bill) Meehan, director of utility solutions for Esri, will take a closer look at the current state of spatial technology and ar...
Cloud based infrastructure deployment is becoming more and more appealing to customers, from Fortune 500 companies to SMEs due to its pay-as-you-go model. Enterprise storage vendors are able to reach out to these customers by integrating in cloud based deployments; this needs adaptability and interoperability of the products confirming to cloud standards such as OpenStack, CloudStack, or Azure. As compared to off the shelf commodity storage, enterprise storages by its reliability, high-availabil...
The IoT industry is now at a crossroads, between the fast-paced innovation of technologies and the pending mass adoption by global enterprises. The complexity of combining rapidly evolving technologies and the need to establish practices for market acceleration pose a strong challenge to global enterprises as well as IoT vendors. In his session at @ThingsExpo, Clark Smith, senior product manager for Numerex, will discuss how Numerex, as an experienced, established IoT provider, has embraced a ...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
In the next forty months – just over three years – businesses will undergo extraordinary changes. The exponential growth of digitization and machine learning will see a step function change in how businesses create value, satisfy customers, and outperform their competition. In the next forty months companies will take the actions that will see them get to the next level of the game called Capitalism. Or they won’t – game over. The winners of today and tomorrow think differently, follow different...
“Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. CloudBerry Backup is a leading cross-platform cloud backup and disaster recovery solution integrated with major public cloud services, such as Amazon Web Services, Microsoft Azure and Google Cloud Platform.
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm ...
Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Oct...
A completely new computing platform is on the horizon. They’re called Microservers by some, ARM Servers by others, and sometimes even ARM-based Servers. No matter what you call them, Microservers will have a huge impact on the data center and on server computing in general. Although few people are familiar with Microservers today, their impact will be felt very soon. This is a new category of computing platform that is available today and is predicted to have triple-digit growth rates for some ...
Everyone knows that truly innovative companies learn as they go along, pushing boundaries in response to market changes and demands. What's more of a mystery is how to balance innovation on a fresh platform built from scratch with the legacy tech stack, product suite and customers that continue to serve as the business' foundation. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue an...