Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Related Topics: @BigDataExpo, @CloudExpo, Apache

@BigDataExpo: Article

Analytics in Decision-Making Workflow | @CloudExpo #BigData #Microservices

Big Data shouldn’t be restricted to data scientists

Putting Analytics into the Decision-Making Workflow with Apache Spark

Data-driven businesses use analytics to inform and support their decisions. In many companies, marketing, sales, finance, and operations departments tend to be the earliest adopters of data analytics, with the rest of the business lagging behind. The goal for many organizations now is to make analytics a natural part of most-if not every-employee's daily workflow. Achieving that objective typically requires a shift in the corporate culture, and ready access to user-friendly data analytics tools.

Big Data Shouldn't Be Restricted to Data Scientists
Big Data experts, when discussing the process of integrating data analysis into the workflow across an enterprise, often talk blithely about how users can easily leverage their SQL skills to query data. The problem is that not everyone has SQL skills-or even knows what SQL is.

Companies who plan to transform themselves into data-driven, lean businesses may want to consider the fact that every employee really doesn't need to be a data scientist. Focus the majority of training efforts (including how to run basic SQL queries, if necessary) on the employees whose jobs involve fact-based decision-making.

Making employees wait for IT to manage schemas and setup ETL tasks is counter-productive. In a busy company, by the time data is prepped for analysis, it may have lost some of its actionable relevance. Instead, provide robust self-service data analysis tools, such as Apache Drill, to enable users to extract the most value possible from data stored in Hadoop. This frees employees to work with data in native formats-schema-less data, nested data, and data with rapidly-evolving schemas-with limited to no IT involvement.

Self-service data tools also enable explorative queries. Users can explore the data directly and extend their analysis effortlessly, with no need to wait for IT to prep additional data sets. Analysis can then extend past known, structured data, to semi-structured and unstructured data, such as call center logs, videos, spreadsheets, social media data, clickstream data, web log files, and external data (such as publicly available industry data)-allowing a business to gain big picture, actionable insights on the fly.

Apache Spark: Bringing New Efficiencies to Big Data Analysis
Agile companies that rely on data analysis performed in near-time and real-time also need solutions that can rapidly process large data sets. Apache Spark, an in-memory data processing framework, is increasingly the solution of choice.

Spark is a framework providing parallel, distributed data processing. Spark can be deployed through Apache Hadoop via Yarn, Apache Mesos, or its own standalone cluster manager. It can serve as a foundation for other data processing frameworks, and supports programming languages including Scala, Java, and Python. Data can be accessed in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

Data sets can be pinned in memory with Spark, which boosts application performance noticeably. Spark also provides speed improvements for applications running on disk and enables MapReduce to support interactive queries and stream processing far more efficiently.

And Spark eliminates the need for separate, distributed systems to process, for example, batch applications, interactive queries, iterative algorithms, and/or streaming. With Spark, all of these processing types are supported by the same engine, reducing management chores and making the processes easier to combine.

Businesses can count on Spark's benefits over the long-term. Spark, initially conceived as a project at UC Berkeley in California, moved to the Apache Software Foundation in 2013 and became a top level project in 2014. Apache top level projects, which include Hadoop, Spark, and httpd, is a designation that indicates a project has strong community backing from developers and users-and has proved its worth. More than 50 companies currently list themselves on Spark's "Powered By" page.

Putting Data-Driven Intelligence to Work
Big Data incarnates multiple processes-collection, cleansing, integration, management, governance, security, analysis, and decision-making-all of which need to be in place before a company can consider itself data-driven. Oddly, the decision-making process itself tends to get the least attention.

Gaining real ROI from a Big Data project requires more than fast tools and a solid plan to enable users to incorporate analysis-driven decision-making into their workflow. Quick discovery of exciting new insights in data has no benefit if a company doesn't have a process that enables an equally speedy and effective response to that new intelligence. When devising (or revising) your Big Data project, ensure that you build in an implementation process that enables analysis to be transformed into action.

And finally, a word of warning about real-time analysis: It's easy to lose sight of long-range goals when you're immersed in the moment. Ensure that business goals are aligned with data analysis activities, and establish KPIs to monitor the success of data-driven initiatives. Big Data should provide a company with a sustainable competitive edge.

To explore more of what Spark has to offer, jump over to Getting Started with Apache Spark: From Inception to Production, a free interactive ebook by James A. Scott.

More Stories By Jim Scott

Jim has held positions running Operations, Engineering, Architecture and QA teams in the Consumer Packaged Goods, Digital Advertising, Digital Mapping, Chemical and Pharmaceutical industries. Jim has built systems that handle more than 50 billion transactions per day and his work with high-throughput computing at Dow Chemical was a precursor to more standardized big data concepts like Hadoop.

@ThingsExpo Stories
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp em...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
Real IoT production deployments running at scale are collecting sensor data from hundreds / thousands / millions of devices. The goal is to take business-critical actions on the real-time data and find insights from stored datasets. In his session at @ThingsExpo, John Walicki, Watson IoT Developer Advocate at IBM Cloud, will provide a fast-paced developer journey that follows the IoT sensor data from generation, to edge gateway, to edge analytics, to encryption, to the IBM Bluemix cloud, to Wa...
There is huge complexity in implementing a successful digital business that requires efficient on-premise and cloud back-end infrastructure, IT and Internet of Things (IoT) data, analytics, Machine Learning, Artificial Intelligence (AI) and Digital Applications. In the data center alone, there are physical and virtual infrastructures, multiple operating systems, multiple applications and new and emerging business and technological paradigms such as cloud computing and XaaS. And then there are pe...
SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Suzuki Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Suzuki Inc. is a semiconductor-related business, including sales of consuming parts, parts repair, and maintenance for semiconductor manufacturing machines, etc. It is also a health care business providing experimental research for...
SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
SYS-CON Events announced today that Keisoku Research Consultant Co. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Keisoku Research Consultant, Co. offers research and consulting in a wide range of civil engineering-related fields from information construction to preservation of cultural properties. For more information, vi...
SYS-CON Events announced today that Daiya Industry will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Daiya Industry specializes in orthotic support systems and assistive devices with pneumatic artificial muscles in order to contribute to an extended healthy life expectancy. For more information, please visit https://www.daiyak...
SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.i...
SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capabi...
In his session at @ThingsExpo, Greg Gorman is the Director, IoT Developer Ecosystem, Watson IoT, will provide a short tutorial on Node-RED, a Node.js-based programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using a wide range of nodes in the palette that can be deployed to its runtime in a single-click. There is a large library of contributed nodes that help so...
Elon Musk is among the notable industry figures who worries about the power of AI to destroy rather than help society. Mark Zuckerberg, on the other hand, embraces all that is going on. AI is most powerful when deployed across the vast networks being built for Internets of Things in the manufacturing, transportation and logistics, retail, healthcare, government and other sectors. Is AI transforming IoT for the good or the bad? Do we need to worry about its potential destructive power? Or will we...
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more inf...