Click here to close now.



Welcome!

Apache Authors: William Schmarzo, Christopher Harrold, Elizabeth White, Talend Inc., Adrian Bridgwater

Blog Feed Post

Pentaho Continues To Innovate: Data Science Pack Operationalizes Use of R and Weka

By

SAN JOSE, Calif.June 3, 2014 /PRNewswire/ – Hadoop Summit – According to the O’Reilly Data Scientist Salary Survey, R is the most-used tool for data scientists, while Weka is a widely used and popular open source collection of machine learning algorithms. Today, Pentaho Corporation announced the Data Science Pack, a toolkit to operationalize these two commonly used technologies. Through Pentaho Data Integration (PDI), data scientists can offload the drudgery of the data flow process with analytic components for R and Weka, so organizations can spend more time on strategic advanced and predictive analytics to achieve a more complete view into customer behavior.

“There was a gap in the market until now and people like myself were piecing together solutions to help with the data preparation, cleansing and orchestration of analytic data sets. The Pentaho Data Science Pack fills that gap to operationalize the data integration process for advanced and predictive analytics,” said Ken Krooner, President at ESRG. “Having embedded Pentaho for over seven years to provide remote and onboard analytics for maritime fleets and ships and several years experience with different data tools, Pentaho Data Integration is critical to my team. Using Weka with PDI, we are now helping clients blend a 360-degree view of all equipment data sources to enable early prediction of potential machinery failure.

According to the Ventana Research Big Data Analytics Benchmark Research, the top two time-consuming big data tasks are solving data quality and consistency issues (46%) and preparing data for integration (52%). Pentaho customer Paytronix manages marketing and loyalty programs in the restaurant sector and their team of data scientists uses R with Pentaho and Hadoop to analyze data to help their customers predict fraud and buying behavior. Saad Khalid, Data Insights Product Manager at Paytronix explains, “Data preparation is an essential, but tedious process. Pentaho Data Integration with R has allowed Paytronix to deliver analytics and insights to our clients much faster. What used to take a few weeks now takes a few minutes.”

“Having built blueprints for the four most popular big data use cases, Pentaho is at the forefront of solving data integration challenges, and we know advanced and predictive analytics are core ingredients for success,” said Christopher Dziekan, EVP and Chief Product Officer at Pentaho. “The highest value of data comes from having foresight blended with hindsight to drive insight and action. The Pentaho Data Science Pack allows organizations to apply their deep domain expertise and improve their customer analytics and predictions.”

The Data Science Pack improves productivity by executing advanced descriptive statistics and machine learning algorithms “at scale” inside of data flow transformations. Included in the pack are:

  • The R Script Executor step enables the more than 5,500 packages in the Comprehensive R Archive Network (CRAN) repository to be used within PDI transformations
  • The Weka Forecasting step uses machine learning techniques to generate forward-looking time-series data sets based on historical observations
  • The Weka Scoring step executes machine learning models to calculate and append probability values to incoming records

Product Availability
The Data Science Pack will be available from Pentaho early this summer.

About Pentaho Corporation

Pentaho is delivering the future of business analytics. The modern, open, and embeddable platform supports the entire big data workflow from data ingestion, preparation and integration, to interactive visualization, analysis and predictive analytics. Companies can access, integrate, orchestrate and blend any data, in any environment. Headquartered in San Francisco andOrlando, with offices throughout Europe, Pentaho has over 1,200 commercial customers today, with over 10,000 production deployments. For more information, please visit www.pentaho.com.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley writes on enterprise IT. He is a founder and partner at Cognitio Corp and publsher of CTOvision.com

@ThingsExpo Stories
IoT is rapidly changing the way enterprises are using data to improve business decision-making. In order to derive business value, organizations must unlock insights from the data gathered and then act on these. In their session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, and Peter Shashkin, Head of Development Department at EastBanc Technologies, discussed how one organization leveraged IoT, cloud technology and data analysis to improve customer experiences and effi...
Basho Technologies has announced the latest release of Basho Riak TS, version 1.3. Riak TS is an enterprise-grade NoSQL database optimized for Internet of Things (IoT). The open source version enables developers to download the software for free and use it in production as well as make contributions to the code and develop applications around Riak TS. Enhancements to Riak TS make it quick, easy and cost-effective to spin up an instance to test new ideas and build IoT applications. In addition to...
The idea of comparing data in motion (at the sensor level) to data at rest (in a Big Data server warehouse) with predictive analytics in the cloud is very appealing to the industrial IoT sector. The problem Big Data vendors have, however, is access to that data in motion at the sensor location. In his session at @ThingsExpo, Scott Allen, CMO of FreeWave, discussed how as IoT is increasingly adopted by industrial markets, there is going to be an increased demand for sensor data from the outermos...
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Presidio has received the 2015 EMC Partner Services Quality Award from EMC Corporation for achieving outstanding service excellence and customer satisfaction as measured by the EMC Partner Services Quality (PSQ) program. Presidio was also honored as the 2015 EMC Americas Marketing Excellence Partner of the Year and 2015 Mid-Market East Partner of the Year. The EMC PSQ program is a project-specific survey program designed for partners with Service Partner designations to solicit customer feedbac...
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discussed how businesses can gain an edge over competitors by empowering consumers to take control through IoT. He cited examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He also highlighted how IoT can revitalize and restore outdated business models, making them profitable ...
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, provided an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data profession...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
Cloud Expo, Inc. has announced today that Andi Mann returns to 'DevOps at Cloud Expo 2016' as Conference Chair The @DevOpsSummit at Cloud Expo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited t...
Apixio Inc. has raised $19.3 million in Series D venture capital funding led by SSM Partners with participation from First Analysis, Bain Capital Ventures and Apixio’s largest angel investor. Apixio will dedicate the proceeds toward advancing and scaling products powered by its cognitive computing platform, further enabling insights for optimal patient care. The Series D funding comes as Apixio experiences strong momentum and increasing demand for its HCC Profiler solution, which mines unstruc...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm ...
Internet of @ThingsExpo has announced today that Chris Matthieu has been named tech chair of Internet of @ThingsExpo 2016 Silicon Valley. The 6thInternet of @ThingsExpo will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
"We work in the area of Big Data analytics and Big Data analytics is a very crowded space - you have Hadoop, ETL, warehousing, visualization and there's a lot of effort trying to get these tools to talk to each other," explained Mukund Deshpande, head of the Analytics practice at Accelerite, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
UAS, drones or unmanned aircraft, no matter what you call them — this was their week. Our news stream was flooded with updates on the newly announced rules and regulations for commercial UAS from the FAA. So, naturally we have dedicated this week’s top news round up to highlight some of our favorite UAS stories.
When people aren’t talking about VMs and containers, they’re talking about serverless architecture. Serverless is about no maintenance. It means you are not worried about low-level infrastructural and operational details. An event-driven serverless platform is a great use case for IoT. In his session at @ThingsExpo, Animesh Singh, an STSM and Lead for IBM Cloud Platform and Infrastructure, will detail how to build a distributed serverless, polyglot, microservices framework using open source tec...
IoT offers a value of almost $4 trillion to the manufacturing industry through platforms that can improve margins, optimize operations & drive high performance work teams. By using IoT technologies as a foundation, manufacturing customers are integrating worker safety with manufacturing systems, driving deep collaboration and utilizing analytics to exponentially increased per-unit margins. However, as Benoit Lheureux, the VP for Research at Gartner points out, “IoT project implementers often ...