Welcome!

Apache Authors: Talend Inc., Adrian Bridgwater, Pat Romanski, Jim Scott, Jnan Dash

Blog Feed Post

The CTOvision Big Data Top Enterprise Tech List: The most important data technologies to accelerate into your infrastructure

By

We produced this list as an aid to the enterprise CTO seeking information on the most capable mission-enabling infrastructure technologies. This is a companion piece to our list of the top analytical technologists on our Analyst One website. Our methodologies are at the bottom of this list.

We trust you will find this list interesting and informative. Have some new technologies to suggest for the list? Let us know at our Contact Page.

 

The CTOvision Big Data Top Enterprise Technologies List

Aerospike
Aerospike: Aerospike delivers the first flash-optimized in-memory database and the most reliable NoSQL database for revenue critical, real-time big data applications. The database of choice in advertising, Aerospike is the user store and system of engagement for Internet-scale, interaction platforms, such as AppNexus, Bluekai, eXelate, The Trade Desk and [x+1], predictably processing terabytes of data and billions of transactions per day, with 10x better performance, 10x fewer servers and zero downtime. Developers in mobile, video, gaming, social, ecommerce, retail and more can create the most compelling interactions extending Aerospike to fit their applications. Aerospike is headquartered in Silicon Valley; investors include Alsop Louie, Draper Associates and NEA.
Appfluent
Appfluent: Appfluent provides IT organizations with visibility into usage and performance of data warehouse and business intelligence systems. IT decision makers can view exactly which enterprise data is being used or not used, determine how business intelligence is performing and identify causes of database performance issues. With Appfluent, customers can address exploding data growth and start the smart move to Hadoop and Big Data.

Arista Networks
Arista Networks: Arista Networks was founded to deliver networking solutions for large data center and HPC environments and delivers a portfolio of Gigabit and 10GBE switches that redefine network architectures, brings extensibility to networking and dramatically changes the price/performance of data center networks. At the core of Arista’s platform is the Extensible Operating System (EOS™), a ground-breaking network operating system with single-image consistency across hardware platforms, and modern core architecture enabling in-service upgrades and application extensibility.

Azul Systems
Azul Systems: Azul Zing™ is essential technology for Big Data applications that are critical to business results. Zing is the only Java performance solution that delivers both very low latency and high sustained throughput for real-time analytics and self-service business intelligence. With Zing your Big Data applications can utilize massive in-memory datasets while delivering predictable performance, allowing reports to be run on more live data with faster results. Zing even reduces or eliminates the need for extra caching applications.

Basho Technologies
Basho Technologies: Basho Technologies is the creator and developer of Riak, an open-source distributed database, providing extreme high-availability, fault-tolerance, and operational simplicity even at scale. Riak has rapidly gained adoption throughout the Fortune 100 and has become foundational to many of the world’s fastest-growing Web-based, mobile and social applications.

Cloudera
Cloudera: Cloudera pioneered the business case for Hadoop with CDH, the world’s most comprehensive, tested and widely deployed distribution of Hadoop. Its Platform for Big Data, Cloudera Enterprise, empowers enterprises to Ask Bigger Questions™ and gain rich, actionable insights from all their data to derive real business value and competitive advantage. As the top contributor to the Apache open source community and leading educator of data professionals, with tens of thousands of nodes under management and hundreds of customers across diverse markets, Cloudera is the category leader that sets the standard for Hadoop in the enterprise.

Couchbase
Couchbase: Couchbase is a leading provider of NoSQL database technology and the company behind the Couchbase open source project. Couchbase Server, the company’s flagship product, is a NoSQL document-oriented database with production deployments at AOL, Cisco, Concur, LinkedIn, Orbitz, Salesforce.com, Shuffle Master, Zynga and hundreds of other household names worldwide. It is particularly well suited for interactive applications, providing easy scalability, consistent high performance, 24×365 availability, and a flexible data model for ease of development.

Data Direct Networks
Data Direct Networks: DDN is the world’s largest, privately-held, data storage infrastructure provider. With a unique and exacting focus on the requirements of today’s massive unstructured data generators, DDN has innovated a comprehensive product portfolio for Big Data applications which are optimized for the world’s most data-intensive environments.  hScaler, the world’s first truly unified analytics appliance factory configured and solutions ready has the ability to be deployed in hours and answer queries in seconds.  hScaler can put you in charge of your big data while truly lowering your TCO.

Dataguise, Inc.
Dataguise, Inc.: Dataguise provides data privacy protection and risk assessment analytics allowing organizations to safely leverage and share enterprise data. Their solutions simplify governance by automatically protecting the data (masking or encryption) and providing actionable compliance intelligence. These capabilities simplify risk management and reduce regulatory compliance costs.

GridGain
GridGain: GridGain develops scalable, distributed, in-memory data platform technology for real time data processing. The company’s Java-based middleware products enable development of applications and services that can instantly access terabytes to petabytes of information from any data source or file system, distribute computational tasks across any number of machines, and produce results orders of magnitude faster than traditionally architected systems. GridGain’s customers include innovative web and mobile businesses, leading Fortune 500 companies, and top government agencies. The company is headquartered in Foster City, California.

Hadapt
Hadapt: Hadapt has developed the industry’s only Big Data analytic platform natively integrating SQL with Apache Hadoop. The unification of these traditionally segregated platforms enables customers to analyze all of their data (structured, semi-structured and unstructured) in a single platform-no connectors, complexities or rigid structure. The company’s core technology began as research in the Yale Computer Science department under co-founders Dr. Daniel Abadi and Ph.D student Kamil Bajda-Pawlikowski. In 2011, led by co-founder and CEO Justin Borgman, Hadapt raised $9.5MM Series A round of funding from Bessemer Venture Partners and Norwest Venture Partners. The company is headquartered in Cambridge, MA.

Jaspersoft
Jaspersoft: Jaspersoft empowers millions of people every day to make faster decisions by bringing them timely, actionable data inside their apps and business processes. Its embeddable, cost-effective reporting and analytics platform allows anyone to quickly self-serve and get the answers they need and scales architecturally and economically to reach everyone.

Kognitio
Kognitio: Kognitio is an in-memory analytical platform that can be tightly integrated with Hadoop for high-performance advanced analytics that make Big Data more consumable for enterprises, especially those with mature BI environments or engrained tools. An MPP platform itself, it enables ad-hoc queries in real-time, wrapped in industry-standard SQL for easy dissemination without MapReduce. Parallelizing standard binary languages like R and Python to run statistical and algorithmic functions in-memory, it is used by Data Scientists, BI professionals and Systems/Database Administrators to give fast access to data that persists in Hadoop and other data storage layers, enabling a Logical Data Warehouse model.

LucidWorks
LucidWorks: LucidWorks, the trusted name in Search, Discovery and Analytics, transforms the way people access information to enable data-driven decisions. Leveraging both structured and unstructured data built on the power of Apache Lucene/Solr open source search, LucidWorks delivers unmatched stability, scalability, and time-to-delivery for search applications. LucidWorks Search provides ease of use development to access up to billions of documents with sub-second query and faceting response time. LucidWorks Big Data tightly integrates key Apache projects needed to build and deploy applications providing ubiquitous access to the data trapped inside Hadoop.

Mellanox Technologies
Mellanox Technologies: Mellanox Technologies (NASDAQ: MLNX, TASE: MLNX) is a leading supplier of end-to-end InfiniBand and Ethernet interconnect solutions and services for servers and storage. Mellanox interconnect solutions increase data center efficiency by providing the highest throughput and lowest latency, delivering data faster to applications and unlocking system performance capability. Mellanox offers a choice of fast interconnect products: adapters, switches, software and silicon that accelerate application runtime and maximize business results for a wide range of markets including high performance computing, enterprise data centers, Web 2.0, cloud, storage and financial services. More information is available at www.mellanox.com. Founded in 1999, Mellanox Technologies is headquartered in Sunnyvale, California and Yokneam, Israel.

MemSQL
MemSQL: MemSQL is a distributed database for real-time analytics. Data scientists, analysts, and developers can query high velocity workloads and historical data simultaneously, all through a convenient SQL interface. By combining significant speed and throughput advantages with complex analytics, an enterprise can gain instant insight to their business and stay competitive in a fast-moving environment.

MetaScale
MetaScale: An early adopter of big data and legacy modernization initiatives, MetaScale provides cutting-edge technologies, Hadoop training and technology solutions to its customers. As a subsidiary of Sears Holdings Corporation, we understand the value of heritage and the need for constant innovation to drive growth. Through this heritage, we offer a deep understanding of employing complex big data tools to solve traditional business problems in the enterprise. Our team brings extensive experience in the migration of workloads off mainframe, large-scale private open-source cloud computing, Hadoop for big data BI and legacy infrastructure modernization.

MongoDB
MongoDB: MongoDB (from humongous) is reinventing data management and powering big data as the leading NoSQL database. Designed for how we build and run applications today, it empowers organizations to be more agile and scalable. MongoDB enables new types of applications, better customer experience, faster time to market and lower costs. It has a thriving global community with over 4 million downloads, 100,000 online education registrations, 20,000 user group members and 20,000 MongoDB Days attendees. The company has more than 600 customers, including many of the world’s largest organizations.

copy-cropped-optensity_logo_header-e1351894976132
Optensity: provides AppSymphony. AppSymphony is a platform that enables businesses and government organizations to exploit big data sources while leveraging scalable computing environments and their current workforce.  AppSymphony’s execution engine runs across a variety of compute environments including Amazon EC2, Rackspace, and Google Compute Engine.  Once an analytic workflow, or “App”, has been authored and validated, it is discoverable and useable by anyone else in the enterprise, maximizing the App’s utility to the entire organization.
Pentaho
Pentaho: Pentaho is building the future of business analytics. Pentaho’s open source heritage drives our continued innovation in a modern, integrated, embeddable platform built for accessing all data sources. With support for all of the leading Hadoop distributions, NoSQL databases and high performance analytic databases, Pentaho provides the broadest support for big data analytics, as well as integration and orchestration of big data and traditional sources.

Platfora
Platfora: Platfora’s mission is to empower customers to transform their businesses into fact-based enterprises. Platfora masks the complexity of Hadoop, making it easy for customers to understand all the facts in their business across events, actions, behaviors and time. For more details, visit www.platfora.com or follow @platfora and #FactBased on twitter.

Progress DataDirect
Progress DataDirect: Progress DataDirect is the world leader in data connectivity, offering the most comprehensive software solutions for connecting the world’s most critical applications to data and services, running on any platform, using proven and emerging standards. Progress Software’s DataDirect Cloud product helps you address the challenges associated with cloud data connectivity by providing a managed service offering that delivers standards based SQL connectivity to a broad spectrum of SaaS, Big Data, Social, and NoSQL data sources. With a proven, 20-year history, strong technical leadership and robust product line, software architects worldwide depend on Progress Software’s DataDirect line of products to connect their applications to an unparalleled range of data sources using standard-based interfaces such as ODBC, JDBC, ADO.NET, XQuery and SOAP.

Protegrity
Protegrity: Protegrity, the innovative leader of groundbreaking enterprise data security software, provides high performance, infinitely scalable end-to-end data security solutions for organizations worldwide. Protegrity helps its customers secure all of their sensitive data in Hadoop and across the enterprise, ensuring compliance with all PCI, PHI and Privacy regulations. Protegrity’s solutions give corporations the ability to implement a variety of data protection methods, including vaultless tokenization, strong encryption, masking and monitoring to ensure the protection of their sensitive data.

Rogue Wave Software
Rogue Wave Software: Rogue Wave Software is the largest independent provider of cross-platform software development tools and embedded components for the next generation of HPC applications. Offering a broad portfolio, Rogue Wave enables developers to increase productivity and harness the power of multicore computing while reducing the complexity of developing multi-processor and data-intensive applications. With Rogue Wave’s IMSL Numerical Libraries, businesses and organizations reduce development time, realize a lower total cost of ownership, and improve quality and maintainability. The robust and portable collection of embeddable math and statistical functions available in native C, C++, C#, Fortran, and Java™ provide sophisticated analytics for high-performance, mission-critical applications.

SGI
SGI: SGI, the trusted leader in technical computing, helps customers solve their most demanding business and technology challenges by delivering high performance computing (HPC), Big Data, and data storage solutions that accelerate time to discovery, innovation, and profitability. Delivering extreme speed, scale, and efficiency, SGI server and storage offerings are utilized by scientific, business, and government communities to solve challenging, data-intensive computing and data management problems, typically requiring large amounts of computing power and fast and efficient data movement both within the computing system and to and from large-scale data storage installations.
SiSense
SiSense: SiSense Prism is a Big Data Analytics Solution that provides the benefits of In-Memory without its disadvantages. SiSense In-Memory Columnar Datastore analyzes 100 times more data at 10 times the speed of comparable solutions. No need to set up complex data warehouse systems or OLAP cubes. No need for programming either, regardless where data comes from or how big it is.

Skytree Inc.
Skytree Inc.: Skytree’s Machine Learning platform gives organizations the power to discover deep analytic insights, predict future trends, make recommendations and reveal untapped markets and customers. Predictive Analytics and Machine Learning are quickly becoming must-have technologies in the age of Big Data, and Skytree provides the Enterprise-grade foundation. Skytree’s flagship product – Skytree Server – is the only general purpose scalable Machine Learning system on the market, built for the highest accuracy at unprecedented speed and scale.

logo_sag
SoftwareAG: provides big data tools and infrastructure including Enterprise Ehcache. Enterprise Ehcache. Enterprise Ehcache snaps into enterprise applications for a faster, easier, more broadly applicable approach to achieving high-performance scalability. Based on the de facto caching standard for enterprise Java, Enterprise Ehcache is an easy-to-deploy solution for hard-to-solve problems. With just a few config changes, you can: Achieve 10-times improvement in application response times, Gain headroom for terabytes of data growth, Offload slow, expensive databases or mainframes, Save on licensing, administration and hardware costs.
Splunk
Splunk: Splunk Inc. (NASDAQ: SPLK) provides the engine for machine data. Splunk software collects, indexes and harnesses the machine-generated big data coming from the websites, applications, servers, networks and mobile devices that power business. Splunk software enables organizations to monitor, search, analyze, visualize and act on massive streams of real-time and historical machine data. More than 4,800 enterprises, universities, government agencies and service providers in over 80 countries use Splunk Enterprise to gain Operational Intelligence that deepens business and customer understanding, improves service and uptime, reduces cost and mitigates cyber-security risk. Splunk Storm, a cloud-based subscription service, is used by organizations developing applications in the cloud.

Sqrrl
Sqrrl: Sqrrl is a Big Data software company whose employees have dealt with the world’s largest, most complex, and most sensitive datasets for the last decade. Sqrrl’s software product, Sqrrl Enterprise, is the most secure and scalable Big Data platform for building real-time analytical applications and is powered by Apache Accumulo™ and Hadoop. Sqrrl Enterprise extends the capabilities of Accumulo with additional data ingest, security, and real-time analytical features that help unlock the power of Big Data.

Zettaset
Zettaset: Zettaset, the leader in secure Big Data management, automates, accelerates, and simplifies Hadoop deployment for the enterprise. Zettaset Orchestrator&tade; is the only Big Data management solution designed to address enterprise requirements for security, high availability, manageability and scalability in a distributed computing environment. Orchestrator helps organizations move Hadoop from pilot into production, replacing open source management with a more robust approach that easily fits into existing enterprise security and policy frameworks. Zettaset Orchestrator provides comprehensive fail-over for all critical cluster services, facilitates integration with the most widely adopted ETL and analytics applications, and is compatible with the leading Hadoop distributions.

 

Our Methodologies 

We firmly believe that technologies must be supported by strong companies, so we focus on companies with proven ability to serve in real enterprises. In most cases we select VC backed firms because those come with staying power. We love open source, but open source solutions should also be supported by a strong firm. We also believe it is important to only report on firms that have products that are really available now (no vaporware).  Additionally, we believe most firms that have a capability that can make a difference for the modern analyst will be interested in demonstrating that capability at Hadoop World. This last assumption allowed us to get a jumpstart on our first list. We started our process by reviewing the full list of sponsors and exhibitors at the coming Hadoop World (for a full list of all exhibitors see here). We then reviewed previous research at our  CTOlabs.com and CTOvision.com sites to round out this initial list.

We know our methodology has some holes. But as good analysts we are going to keep our eyes and ears open for other technologies we can report on and will modify this list as required. We also know we have you, dear readers, to check our assumptions and give us feedback on the list. If you have or know of a firm we should consider for this, let us know.

 

 

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder and partner at Cognitio Corp and publsher of CTOvision.com

@ThingsExpo Stories
With an estimated 50 billion devices connected to the Internet by 2020, several industries will begin to expand their capabilities for retaining end point data at the edge to better utilize the range of data types and sheer volume of M2M data generated by the Internet of Things. In his session at @ThingsExpo, Don DeLoach, CEO and President of Infobright, will discuss the infrastructures businesses will need to implement to handle this explosion of data by providing specific use cases for filte...
Fortunately, meaningful and tangible business cases for IoT are plentiful in a broad array of industries and vertical markets. These range from simple warranty cost reduction for capital intensive assets, to minimizing downtime for vital business tools, to creating feedback loops improving product design, to improving and enhancing enterprise customer experiences. All of these business cases, which will be briefly explored in this session, hinge on cost effectively extracting relevant data from ...
SYS-CON Events announced today that VAI, a leading ERP software provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. VAI (Vormittag Associates, Inc.) is a leading independent mid-market ERP software developer renowned for its flexible solutions and ability to automate critical business functions for the distribution, manufacturing, specialty retail and service sectors. An IBM Premier Business Part...
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
As enterprises work to take advantage of Big Data technologies, they frequently become distracted by product-level decisions. In most new Big Data builds this approach is completely counter-productive: it presupposes tools that may not be a fit for development teams, forces IT to take on the burden of evaluating and maintaining unfamiliar technology, and represents a major up-front expense. In his session at @BigDataExpo at @ThingsExpo, Andrew Warfield, CTO and Co-Founder of Coho Data, will dis...
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, will provide an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data profes...
With the Apple Watch making its way onto wrists all over the world, it’s only a matter of time before it becomes a staple in the workplace. In fact, Forrester reported that 68 percent of technology and business decision-makers characterize wearables as a top priority for 2015. Recognizing their business value early on, FinancialForce.com was the first to bring ERP to wearables, helping streamline communication across front and back office functions. In his session at @ThingsExpo, Kevin Roberts...
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry's single source for the cloud. Fusion's advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including clou...
Most people haven’t heard the word, “gamification,” even though they probably, and perhaps unwittingly, participate in it every day. Gamification is “the process of adding games or game-like elements to something (as a task) so as to encourage participation.” Further, gamification is about bringing game mechanics – rules, constructs, processes, and methods – into the real world in an effort to engage people. In his session at @ThingsExpo, Robert Endo, owner and engagement manager of Intrepid D...
Eighty percent of a data scientist’s time is spent gathering and cleaning up data, and 80% of all data is unstructured and almost never analyzed. Cognitive computing, in combination with Big Data, is changing the equation by creating data reservoirs and using natural language processing to enable analysis of unstructured data sources. This is impacting every aspect of the analytics profession from how data is mined (and by whom) to how it is delivered. This is not some futuristic vision: it's ha...
WebRTC has had a real tough three or four years, and so have those working with it. Only a few short years ago, the development world were excited about WebRTC and proclaiming how awesome it was. You might have played with the technology a couple of years ago, only to find the extra infrastructure requirements were painful to implement and poorly documented. This probably left a bitter taste in your mouth, especially when things went wrong.
Learn how IoT, cloud, social networks and last but not least, humans, can be integrated into a seamless integration of cooperative organisms both cybernetic and biological. This has been enabled by recent advances in IoT device capabilities, messaging frameworks, presence and collaboration services, where devices can share information and make independent and human assisted decisions based upon social status from other entities. In his session at @ThingsExpo, Michael Heydt, founder of Seamless...
The IoT's basic concept of collecting data from as many sources possible to drive better decision making, create process innovation and realize additional revenue has been in use at large enterprises with deep pockets for decades. So what has changed? In his session at @ThingsExpo, Prasanna Sivaramakrishnan, Solutions Architect at Red Hat, discussed the impact commodity hardware, ubiquitous connectivity, and innovations in open source software are having on the connected universe of people, thi...
WebRTC: together these advances have created a perfect storm of technologies that are disrupting and transforming classic communications models and ecosystems. In his session at WebRTC Summit, Cary Bran, VP of Innovation and New Ventures at Plantronics and PLT Labs, provided an overview of this technological shift, including associated business and consumer communications impacts, and opportunities it may enable, complement or entirely transform.
There are so many tools and techniques for data analytics that even for a data scientist the choices, possible systems, and even the types of data can be daunting. In his session at @ThingsExpo, Chris Harrold, Global CTO for Big Data Solutions for EMC Corporation, showed how to perform a simple, but meaningful analysis of social sentiment data using freely available tools that take only minutes to download and install. Participants received the download information, scripts, and complete end-t...
For manufacturers, the Internet of Things (IoT) represents a jumping-off point for innovation, jobs, and revenue creation. But to adequately seize the opportunity, manufacturers must design devices that are interconnected, can continually sense their environment and process huge amounts of data. As a first step, manufacturers must embrace a new product development ecosystem in order to support these products.
Manufacturing connected IoT versions of traditional products requires more than multiple deep technology skills. It also requires a shift in mindset, to realize that connected, sensor-enabled “things” act more like services than what we usually think of as products. In his session at @ThingsExpo, David Friedman, CEO and co-founder of Ayla Networks, discussed how when sensors start generating detailed real-world data about products and how they’re being used, smart manufacturers can use the dat...
When it comes to IoT in the enterprise, namely the commercial building and hospitality markets, a benefit not getting the attention it deserves is energy efficiency, and IoT’s direct impact on a cleaner, greener environment when installed in smart buildings. Until now clean technology was offered piecemeal and led with point solutions that require significant systems integration to orchestrate and deploy. There didn't exist a 'top down' approach that can manage and monitor the way a Smart Buildi...