Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Related Topics: @BigDataExpo, Open Source Cloud, Apache

@BigDataExpo: Blog Feed Post

All the Apache Streaming Projects: An Exploratory Guide | @BigDataExpo #Apache #BigData #OpenSource

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace. Social media, the Internet of Things, ad tech, and gaming verticals are struggling to deal with the disproportionate size of data sets. These industries demand data processing and analysis in near real-time. Traditional Big Data-styled frameworks such as Apache Hadoop are not well-suited for these use cases.

As a result, multiple open source projects have been started in the last few years to deal with the streaming data. All were designed to process a never-ending sequence of records originating from more than one source. From Kafka to Beam, there are over a dozen Apache projects in various stages of completion.

With a high overlap, the current Apache streaming projects address similar scenarios. Users often find it confusing to choose the right open source stack for implementing a real-time stream processing solution. This article attempts to help customers navigate the complex maze of Apache streaming projects by calling out the key differentiators for each. We will discuss the use cases and key scenarios addressed by Apache Kafka, Apache Storm, Apache Spark, Apache Samza, Apache Beam and related projects.

Apache Flume
Apache Flume
is one of the oldest Apache projects designed to collect, aggregate, and move large data sets such as web server logs to a centralized location. It belongs to the data collection and single-event processing family of stream processing solutions. Flume is based on an agent-driven architecture in which the events generated by clients are streamed directly to Apache Hive, HBase or other data stores.

Flume’s configuration includes a source, channel, and sink. The source can be anything from a Syslog to the Twitter stream to an Avro endpoint. The channel defines how the stream is delivered to the destination. The valid options include Memory, JDBC, Kafka, File among others. The sink determines the destination where the stream gets delivered. Flume supports many sinks such as HDFS, Hive, HBase, ElasticSearch, Kafka and others.

Apache Flume is ideal for scenarios where the client infrastructure supports installing agents. The most popular use case is to stream logs from multiple sources to a central, persistent data store for further processing analysis.

Sample Use Case: Streaming logs from multiple sources capable of running JVM.

Read the article at The New Stack.

Janakiram MSV is an analyst, advisor, and architect. Follow him on Twitter,  Facebook and LinkedIn.

Read the original blog entry...

More Stories By Janakiram MSV

Janakiram MSV heads the Cloud Infrastructure Services at Aditi Technologies. He was the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquired by Aditi Technologies. In his current role, he leads a highly talented engineering team that focuses on migrating and managing applications deployed on Amazon Web Services and Microsoft Windows Azure Infrastructure Services.
Janakiram is an industry analyst with deep understanding of Cloud services. Through his speaking, writing and analysis, he helps businesses take advantage of the emerging technologies. He leverages his experience of engaging with the industry in developing informative and practical research, analysis and authoritative content to inform, influence and guide decision makers. He analyzes market trends, new products / features, announcements, industry happenings and the impact of executive transitions.
Janakiram is one of the first few Microsoft Certified Professionals on Windows Azure in India. Demystifying The Cloud, an eBook authored by Janakiram is downloaded more than 100,000 times within the first few months. He is the Chief Editor of a popular portal on Cloud called www.CloudStory.in that covers the latest trends in Cloud Computing. Janakiram is an analyst with the GigaOM Pro analyst network where he analyzes the Cloud Services landscape. He is a guest faculty at the International Institute of Information Technology, Hyderabad (IIIT-H) where he teaches Big Data and Cloud Computing to students enrolled for the Masters course. As a passionate speaker, he has chaired the Cloud Computing track at premier events in India.
He has been the keynote speaker at many premier conferences, and his seminars are attended by thousands of architects, developers and IT professionals. His sessions are rated among the best in every conference he participates.
Janakiram has worked at the world-class product companies including Microsoft Corporation, Amazon Web Services and Alcatel-Lucent. Joining as the first employee of Amazon Web Services in India, he was the AWS Technology Evangelist. Prior to that, Janakiram spent 10 years at Microsoft Corporation where he was involved in selling, marketing and evangelizing the Microsoft Application Platform and Tools.

@ThingsExpo Stories
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Consumers increasingly expect their electronic "things" to be connected to smart phones, tablets and the Internet. When that thing happens to be a medical device, the risks and benefits of connectivity must be carefully weighed. Once the decision is made that connecting the device is beneficial, medical device manufacturers must design their products to maintain patient safety and prevent compromised personal health information in the face of cybersecurity threats. In his session at @ThingsExpo...
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
Because IoT devices are deployed in mission-critical environments more than ever before, it’s increasingly imperative they be truly smart. IoT sensors simply stockpiling data isn’t useful. IoT must be artificially and naturally intelligent in order to provide more value In his session at @ThingsExpo, John Crupi, Vice President and Engineering System Architect at Greenwave Systems, will discuss how IoT artificial intelligence (AI) can be carried out via edge analytics and machine learning techn...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, discussed the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insights from the vast t...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
There is only one world-class Cloud event on earth, and that is Cloud Expo – which returns to Silicon Valley for the 21st Cloud Expo at the Santa Clara Convention Center, October 31 - November 2, 2017. Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers. Companies are each developing their unique mix of cloud technologies and service...
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, will introduce two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a...
Recently, IoT seems emerging as a solution vehicle for data analytics on real-world scenarios from setting a room temperature setting to predicting a component failure of an aircraft. Compared with developing an application or deploying a cloud service, is an IoT solution unique? If so, how? How does a typical IoT solution architecture consist? And what are the essential components and how are they relevant to each other? How does the security play out? What are the best practices in formulating...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
In his session at @ThingsExpo, Arvind Radhakrishnen discussed how IoT offers new business models in banking and financial services organizations with the capability to revolutionize products, payments, channels, business processes and asset management built on strong architectural foundation. The following topics were covered: How IoT stands to impact various business parameters including customer experience, cost and risk management within BFS organizations.
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient...