Welcome!

Apache Authors: William Schmarzo, Christopher Harrold, Elizabeth White, Talend Inc., Adrian Bridgwater

Blog Feed Post

Reflections on John Chambers’ UserR! 2014 Keynote Address

by Joseph Rickert John Chambers opened UseR! 2014 by describing how the R language grew out of early efforts to give statisticians easier access to high quality statistical software. In 1976 computational statistics was a very active field, but most algorithms were compiled as Fortran subroutines. Building models with this software was not a trivial process. First you had to write a main Fortran program to implement the model and call the right subroutines, and then you had to write the job control language code to submit your job and get it executed. When John and his Bell Labs colleagues sat down on that May afternoon to work on what would become the first implementation of the S language they were thinking about how they could make this process easier. The top half John’s famous diagram from that afternoon schematically indicates their intention to design a software interface so that one could call an arbitrary Fortran subroutine, ABC, by wrapping it in some simplified calling syntax: XABC( ).    The main idea was to bring the best computational facilities to the people doing the analysis. As John phrased it: “combine serious computational challenges with convenience”. In the end, the designers of both S, and its second incarnation, R, did much better than convenience. They built a tool to facilitate “flow”. When you are engaged in any mentally challenging work in (including statistical analysis) at a high level of play, you want to be able to stay in the zone and not get knocked out by peripheral tasks that interrupt your thought processes. As engaging and meaningful as it is in its own right, writing code is not doing statistics. One of the big advantages of working with R is that you can do quite a bit of statistics with just a handful of functions and the simplest syntax. R is a tool that helps you keep moving forward. If you want to see something then plot it. If the data in the wrong format, then mutate it. A second idea that flows from the idea of S as an interface is that S was not intended to be self sufficient. John was explicit that S was designed as an interface to the “best algorithms”, not as a “from the ground up programming language”. The idea of being able to make use of external computational resources is still compelling. There will always be high-quality stuff that we will want to get at. Moreover, as John elaborated: “unlike 38 years ago there are many possible interfaces to languages, to other computing models and to (specialized) hardware”. The challenge is to interface to applications that are “too diverse for one solution to fit them all”, and to do this “without loosing the R that works in ‘ordinary’ circumstances. John offered three examples of R projects that extend the reach of R to leverage other computing environments. Rcpp -  turns C++ in to an R function by generating an interface to C++ with much less programming effort than .Call RLLVM - enables compiling R language code into specialized forms for efficiency and other purposes H2O - provides a compressed, efficient external version of a data frame for running statistical models on large data sets. These examples, chosen to represent each of the three different kinds of interface targets that John called out, also represent projects of different scope and levels of integration. With a total of 226 reverse depends and reverse imports,  Rcpp is already a great success. It is likely that ready access to C++ will form a permanent part of the R programmers mindset. RLLVM is a much more radical and ambitious project that would allow R to be the window to entirely different computing models. As best I understand it, the central idea is to use the R environment as the system interface to “any number of new languages” perhaps languages that have not yet been invented. RLLVM would “Use R syntax for commands to be interpreted in a different interpreter”.  RLLVM seems to be a powerful idea and a direct generalization of the original XABC() idea. The RH2O package is an example of providing R users with transparent access to data sets that are too large to fit into memory. It is one of many efforts underway (including those from Revolution Analytics) to integrate Hadoop, Teradata, Spark and other specialized computing platforms within the R environment. Some of these specialized platforms may indeed be longed lived, but it is not likely that all of them will. From the point of view of doing statistics, it is the R interface that is likely to survive and persist, platforms will come and go. An implication of the willingness of R developers to embrace diversity is that R is likely to always be a work in progress. There will be loose ends, annoying inconsistencies and unimplemented possibilities. I suppose that there are people who will never be comfortable with this state of affairs. It is not unreasonable to prefer a system where there is one best way to do something and where, within the bounds of some pre-established design, there is near perfect consistency. However, the pursuit of uniformity and consistency seems to me to doom designers to be at least one step behind, because it means continually starting over to get things right. So what does this say about the future of R? John closed his talk by stating that “the best future would be one of variety, not uniformity”. I take this to mean that, for the near future anyway, whatever the next big thing is, it is likely that someone will write an R package to talk to it.  Some links regarding S and R History: John Chambers useR! 2006 slides Trevor Hastie's Interview with John Chambers Ross Ihaka: R: Past and Future History New York Times Article

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
Increasing IoT connectivity is forcing enterprises to find elegant solutions to organize and visualize all incoming data from these connected devices with re-configurable dashboard widgets to effectively allow rapid decision-making for everything from immediate actions in tactical situations to strategic analysis and reporting. In his session at 18th Cloud Expo, Shikhir Singh, Senior Developer Relations Manager at Sencha, will discuss how to create HTML5 dashboards that interact with IoT devic...
The increasing popularity of the Internet of Things necessitates that our physical and cognitive relationship with wearable technology will change rapidly in the near future. This advent means logging has become a thing of the past. Before, it was on us to track our own data, but now that data is automatically available. What does this mean for mHealth and the "connected" body? In her session at @ThingsExpo, Lisa Calkins, CEO and co-founder of Amadeus Consulting, will discuss the impact of wea...
There is an ever-growing explosion of new devices that are connected to the Internet using “cloud” solutions. This rapid growth is creating a massive new demand for efficient access to data. And it’s not just about connecting to that data anymore. This new demand is bringing new issues and challenges and it is important for companies to scale for the coming growth. And with that scaling comes the need for greater security, gathering and data analysis, storage, connectivity and, of course, the...
Digital payments using wearable devices such as smart watches, fitness trackers, and payment wristbands are an increasing area of focus for industry participants, and consumer acceptance from early trials and deployments has encouraged some of the biggest names in technology and banking to continue their push to drive growth in this nascent market. Wearable payment systems may utilize near field communication (NFC), radio frequency identification (RFID), or quick response (QR) codes and barcodes...
SYS-CON Events announced today that DatacenterDynamics has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY. DatacenterDynamics is a brand of DCD Group, a global B2B media and publishing company that develops products to help senior professionals in the world's most ICT dependent organizations make risk-based infrastructure and capacity decisions.
SYS-CON Events announced today TMCnet has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Technology Marketing Corporation (TMC) is the world's leading business-to-business and integrated marketing media company, servicing niche markets within the com...
The IoT has the potential to create a renaissance of manufacturing in the US and elsewhere. In his session at 18th Cloud Expo, Florent Solt, CTO and chief architect of Netvibes, will discuss how the expected exponential increase in the amount of data that will be processed, transported, stored, and accessed means there will be a huge demand for smart technologies to deliver it. Florent Solt is the CTO and chief architect of Netvibes. Prior to joining Netvibes in 2007, he co-founded Rift Technol...
Join IBM June 8 at 18th Cloud Expo at the Javits Center in New York City, NY, and learn how to innovate like a startup and scale for the enterprise. You need to deliver quality applications faster and cheaper, attract and retain customers with an engaging experience across devices, and seamlessly integrate your enterprise systems. And you can't take 12 months to do it.
This is not a small hotel event. It is also not a big vendor party where politicians and entertainers are more important than real content. This is Cloud Expo, the world's longest-running conference and exhibition focused on Cloud Computing and all that it entails. If you want serious presentations and valuable insight about Cloud Computing for three straight days, then register now for Cloud Expo.
IoT device adoption is growing at staggering rates, and with it comes opportunity for developers to meet consumer demand for an ever more connected world. Wireless communication is the key part of the encompassing components of any IoT device. Wireless connectivity enhances the device utility at the expense of ease of use and deployment challenges. Since connectivity is fundamental for IoT device development, engineers must understand how to overcome the hurdles inherent in incorporating multipl...
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, will discuss how research has demonstrated the value of Machine Learning in delivering next generation analytics to im...
Manufacturers are embracing the Industrial Internet the same way consumers are leveraging Fitbits – to improve overall health and wellness. Both can provide consistent measurement, visibility, and suggest performance improvements customized to help reach goals. Fitbit users can view real-time data and make adjustments to increase their activity. In his session at @ThingsExpo, Mark Bernardo Professional Services Leader, Americas, at GE Digital, will discuss how leveraging the Industrial Interne...
The paradigm has shifted. A Gartner survey shows that 43% of organizations are using or plan to implement the Internet of Things in 2016. However, not just a handful of companies are still using the old-style ad-hoc trial-and-error ways, unaware of the critical barriers, paint points, traps, and hidden roadblocks. How can you become a winner? In his session at @ThingsExpo, Tony Shan will present a methodical approach to guide the holistic adoption and enablement of IoT implementations. This ov...
SYS-CON Events announced today that Stratoscale, the software company developing the next generation data center operating system, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Stratoscale is revolutionizing the data center with a zero-to-cloud-in-minutes solution. With Stratoscale’s hardware-agnostic, Software Defined Data Center (SDDC) solution to store everything, run anything and scale everywhere...
Angular 2 is a complete re-write of the popular framework AngularJS. Programming in Angular 2 is greatly simplified – now it's a component-based well-performing framework. This immersive one-day workshop at 18th Cloud Expo, led by Yakov Fain, a Java Champion and a co-founder of the IT consultancy Farata Systems and the product company SuranceBay, will provide you with everything you wanted to know about Angular 2.
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.
SYS-CON Events announced today that Ericsson has been named “Gold Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. Ericsson is a world leader in the rapidly changing environment of communications technology – providing equipment, software and services to enable transformation through mobility. Some 40 percent of global mobile traffic runs through networks we have supplied. More than 1 billion subscribers around the world re...
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, will provide tips on how to be successful in large scale machine lear...