Welcome!

Apache Authors: William Schmarzo, Christopher Harrold, Elizabeth White, Talend Inc., Adrian Bridgwater

Blog Feed Post

Some R Resources for GLMs

by Joseph Rickert Generalized Linear Models have become part of the fabric of modern statistics, and logistic regression, at least, is a “go to” tool for data scientists building classification applications. The ready availability of good GLM software and the interpretability of the results logistic regression makes it a good baseline classifier. Moreover, Paul Komarek argues that, with a little bit tweaking, the basic iteratively reweighted least squares algorithm used to evaluate the maximum likelihood estimates can be made robust and stable enough to allow logistic regression to challenge specialized classifiers such as support vector machines.   It is relatively easy to figure how to code a GLM in R. Even a total newcomer to R is likely to figure out that the glm()function is part of the core R language within a minute or so of searching. Thereafter though, it gets more difficult to find other GLM related stuff that R has to offer. Here is a far from complete, but hopefully helpful, list of resources. Online documentation that I have found helpful includes the contributed book by Virasakdi Chongsuvivatwong and the tutorials from Princeton and UCLA. Here is slick visualization of a poisson model from the Freakonometrics blog. But finding introductory materials is on GLMs is not difficult. Almost all of the many books on learning statistics with R have chapters on the GLM including the classic Modern Applied Statistics with S, by Venables and Ripley, and one of my favorite texts, Data Analysis and Graphics Using R, by Maindonald and Braun. It is more of a challenge, however, to sort through the more than 5,000 packages on CRAN to find additional functions that could help with various specialized aspects or extensions to the GLM. So here is a short list of GLM related packages. Packages to help with convergence and improve the fit glm2 implements a refinement to the iteratively reweighted least squares algorithm in order to help with convergence issues commonly associated with nonstandard link functions. brglm fits binomial response models with a bias reduction method safeBinaryRegression provides a function that overloads glm() to provide a test for the existence of the maximum likelihood estimates for binomial models pscl provides goodness of fit measures for GLMs Packages for variable selection and regularization bestglm selects a “best” subset of input variables for GLMs using cross validation and various information criteria. glmnet provides functions to fit linear regression, binary logistic regression and multinomial normal regression with convex penalties. penalized fits high dimensional logistic and poisson models with L1 and L2 penalties Packages for special models mlogit fits multinomial logit models. lme4 provides functions to fit mixed-effect GLMS hglm fits hierarchical GLMs with both fixed and random effects glmmML provides functions to fit binomial and poisson models with clustering. Bayesian GLMs arm provides functions for Bayesian GLMs (Look here for a discussion of how Bayesian ideas can help with GLM problems.) bayesm contains functions for Bayesian GLMs including binary and ordinal probit, multinomial logit, multinomial probit models and more MCMCglmm provides functions to fit mixed GLMs using MCMC techniques GLMs for Big Data The bigglm() function in the biglm package fits GLMs that are too big to fit into memory. H20 package from 0xdata provides an R wrapper for the h2o.glm function for fitting GLMs on Hadoop and other platforms speedglm fits GLMs to large data sets using an updating procedure. RrevoScaleR (Revolution R Enterprise) provides parallel external memory algorithms for fitting GLMs on clusters, Hadoop, Teradata and other platforms Generalized Additive Models, GAMS,generalize GLMs gam provides functions to fit the Generalized Additive Model gamm4 fits mixed GAMs. mgcv provides functions to fit GAMs with muttiple smoothing methods. VGAM provides functions to fit vector GLMs and GAMs. Beyond the documentation and and a list of packages that may be useful, it is also nice to have the benefit of some practical experience. John Mount has written prolifically about logistic regression in his Win-Vector Blog over the past few years. His post, How robust is logistic regression, is an illuminating discussion of convergence issues surrounding Newton-Raphson/Iteratively-Reweighted-Least Squares. It contains pointers to examples illustrating the trouble caused by complete or quasi-complete separation as well as links to the academic literature. This post is a classic, but all of the other posts in the series are very much worth the read. Finally, as a reminder of the trouble you can get into interpreting t-values from a GLM, here is another classic, a post from the S-News archives on the Hauck-Donner phenomenon.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
"We work in the area of Big Data analytics and Big Data analytics is a very crowded space - you have Hadoop, ETL, warehousing, visualization and there's a lot of effort trying to get these tools to talk to each other," explained Mukund Deshpande, head of the Analytics practice at Accelerite, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Cloud Expo, Inc. has announced today that Andi Mann returns to 'DevOps at Cloud Expo 2016' as Conference Chair The @DevOpsSummit at Cloud Expo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited t...
Basho Technologies has announced the latest release of Basho Riak TS, version 1.3. Riak TS is an enterprise-grade NoSQL database optimized for Internet of Things (IoT). The open source version enables developers to download the software for free and use it in production as well as make contributions to the code and develop applications around Riak TS. Enhancements to Riak TS make it quick, easy and cost-effective to spin up an instance to test new ideas and build IoT applications. In addition to...
IoT is rapidly changing the way enterprises are using data to improve business decision-making. In order to derive business value, organizations must unlock insights from the data gathered and then act on these. In their session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, and Peter Shashkin, Head of Development Department at EastBanc Technologies, discussed how one organization leveraged IoT, cloud technology and data analysis to improve customer experiences and effi...
Internet of @ThingsExpo has announced today that Chris Matthieu has been named tech chair of Internet of @ThingsExpo 2016 Silicon Valley. The 6thInternet of @ThingsExpo will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Presidio has received the 2015 EMC Partner Services Quality Award from EMC Corporation for achieving outstanding service excellence and customer satisfaction as measured by the EMC Partner Services Quality (PSQ) program. Presidio was also honored as the 2015 EMC Americas Marketing Excellence Partner of the Year and 2015 Mid-Market East Partner of the Year. The EMC PSQ program is a project-specific survey program designed for partners with Service Partner designations to solicit customer feedbac...
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, provided an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data profession...
"There's a growing demand from users for things to be faster. When you think about all the transactions or interactions users will have with your product and everything that is between those transactions and interactions - what drives us at Catchpoint Systems is the idea to measure that and to analyze it," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York Ci...
Ask someone to architect an Internet of Things (IoT) solution and you are guaranteed to see a reference to the cloud. This would lead you to believe that IoT requires the cloud to exist. However, there are many IoT use cases where the cloud is not feasible or desirable. In his session at @ThingsExpo, Dave McCarthy, Director of Products at Bsquare Corporation, will discuss the strategies that exist to extend intelligence directly to IoT devices and sensors, freeing them from the constraints of ...
Connected devices and the industrial internet are growing exponentially every year with Cisco expecting 50 billion devices to be in operation by 2020. In this period of growth, location-based insights are becoming invaluable to many businesses as they adopt new connected technologies. Knowing when and where these devices connect from is critical for a number of scenarios in supply chain management, disaster management, emergency response, M2M, location marketing and more. In his session at @Th...
Extracting business value from Internet of Things (IoT) data doesn’t happen overnight. There are several requirements that must be satisfied, including IoT device enablement, data analysis, real-time detection of complex events and automated orchestration of actions. Unfortunately, too many companies fall short in achieving their business goals by implementing incomplete solutions or not focusing on tangible use cases. In his general session at @ThingsExpo, Dave McCarthy, Director of Products...
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform and how we integrate our thinking to solve complicated problems. In his session at 19th Cloud Expo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm ...
Apixio Inc. has raised $19.3 million in Series D venture capital funding led by SSM Partners with participation from First Analysis, Bain Capital Ventures and Apixio’s largest angel investor. Apixio will dedicate the proceeds toward advancing and scaling products powered by its cognitive computing platform, further enabling insights for optimal patient care. The Series D funding comes as Apixio experiences strong momentum and increasing demand for its HCC Profiler solution, which mines unstruc...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discussed how businesses can gain an edge over competitors by empowering consumers to take control through IoT. He cited examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He also highlighted how IoT can revitalize and restore outdated business models, making them profitable ...