Welcome!

Apache Authors: MC Brown, Amy Lindberg, ChandraShekar Dattatreya, Carmen Gonzalez, Sanjeev Sharma

Blog Feed Post

Some R Resources for GLMs

by Joseph Rickert Generalized Linear Models have become part of the fabric of modern statistics, and logistic regression, at least, is a “go to” tool for data scientists building classification applications. The ready availability of good GLM software and the interpretability of the results logistic regression makes it a good baseline classifier. Moreover, Paul Komarek argues that, with a little bit tweaking, the basic iteratively reweighted least squares algorithm used to evaluate the maximum likelihood estimates can be made robust and stable enough to allow logistic regression to challenge specialized classifiers such as support vector machines.   It is relatively easy to figure how to code a GLM in R. Even a total newcomer to R is likely to figure out that the glm()function is part of the core R language within a minute or so of searching. Thereafter though, it gets more difficult to find other GLM related stuff that R has to offer. Here is a far from complete, but hopefully helpful, list of resources. Online documentation that I have found helpful includes the contributed book by Virasakdi Chongsuvivatwong and the tutorials from Princeton and UCLA. Here is slick visualization of a poisson model from the Freakonometrics blog. But finding introductory materials is on GLMs is not difficult. Almost all of the many books on learning statistics with R have chapters on the GLM including the classic Modern Applied Statistics with S, by Venables and Ripley, and one of my favorite texts, Data Analysis and Graphics Using R, by Maindonald and Braun. It is more of a challenge, however, to sort through the more than 5,000 packages on CRAN to find additional functions that could help with various specialized aspects or extensions to the GLM. So here is a short list of GLM related packages. Packages to help with convergence and improve the fit glm2 implements a refinement to the iteratively reweighted least squares algorithm in order to help with convergence issues commonly associated with nonstandard link functions. brglm fits binomial response models with a bias reduction method safeBinaryRegression provides a function that overloads glm() to provide a test for the existence of the maximum likelihood estimates for binomial models pscl provides goodness of fit measures for GLMs Packages for variable selection and regularization bestglm selects a “best” subset of input variables for GLMs using cross validation and various information criteria. glmnet provides functions to fit linear regression, binary logistic regression and multinomial normal regression with convex penalties. penalized fits high dimensional logistic and poisson models with L1 and L2 penalties Packages for special models mlogit fits multinomial logit models. lme4 provides functions to fit mixed-effect GLMS hglm fits hierarchical GLMs with both fixed and random effects glmmML provides functions to fit binomial and poisson models with clustering. Bayesian GLMs arm provides functions for Bayesian GLMs (Look here for a discussion of how Bayesian ideas can help with GLM problems.) bayesm contains functions for Bayesian GLMs including binary and ordinal probit, multinomial logit, multinomial probit models and more MCMCglmm provides functions to fit mixed GLMs using MCMC techniques GLMs for Big Data The bigglm() function in the biglm package fits GLMs that are too big to fit into memory. H20 package from 0xdata provides an R wrapper for the h2o.glm function for fitting GLMs on Hadoop and other platforms speedglm fits GLMs to large data sets using an updating procedure. RrevoScaleR (Revolution R Enterprise) provides parallel external memory algorithms for fitting GLMs on clusters, Hadoop, Teradata and other platforms Generalized Additive Models, GAMS,generalize GLMs gam provides functions to fit the Generalized Additive Model gamm4 fits mixed GAMs. mgcv provides functions to fit GAMs with muttiple smoothing methods. VGAM provides functions to fit vector GLMs and GAMs. Beyond the documentation and and a list of packages that may be useful, it is also nice to have the benefit of some practical experience. John Mount has written prolifically about logistic regression in his Win-Vector Blog over the past few years. His post, How robust is logistic regression, is an illuminating discussion of convergence issues surrounding Newton-Raphson/Iteratively-Reweighted-Least Squares. It contains pointers to examples illustrating the trouble caused by complete or quasi-complete separation as well as links to the academic literature. This post is a classic, but all of the other posts in the series are very much worth the read. Finally, as a reminder of the trouble you can get into interpreting t-values from a GLM, here is another classic, a post from the S-News archives on the Hauck-Donner phenomenon.

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
The definition of IoT is not new, in fact it’s been around for over a decade. What has changed is the public's awareness that the technology we use on a daily basis has caught up on the vision of an always on, always connected world. If you look into the details of what comprises the IoT, you’ll see that it includes everything from cloud computing, Big Data analytics, “Things,” Web communication, applications, network, storage, etc. It is essentially including everything connected online from hardware to software, or as we like to say, it’s an Internet of many different things. The difference ...
Cloud Expo 2014 TV commercials will feature @ThingsExpo, which was launched in June, 2014 at New York City's Javits Center as the largest 'Internet of Things' event in the world.
An entirely new security model is needed for the Internet of Things, or is it? Can we save some old and tested controls for this new and different environment? In his session at @ThingsExpo, New York's at the Javits Center, Davi Ottenheimer, EMC Senior Director of Trust, reviewed hands-on lessons with IoT devices and reveal a new risk balance you might not expect. Davi Ottenheimer, EMC Senior Director of Trust, has more than nineteen years' experience managing global security operations and assessments, including a decade of leading incident response and digital forensics. He is co-author of t...

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's platform-as-a-service. The new platform enables developers to build ap...

The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Architect for the Internet of Things and Intelligent Systems at Red Hat, described how to revolutioniz...
Technology is enabling a new approach to collecting and using data. This approach, commonly referred to as the "Internet of Things" (IoT), enables businesses to use real-time data from all sorts of things including machines, devices and sensors to make better decisions, improve customer service, and lower the risk in the creation of new revenue opportunities. In his General Session at Internet of @ThingsExpo, Dave Wagstaff, Vice President and Chief Architect at BSQUARE Corporation, discuss the real benefits to focus on, how to understand the requirements of a successful solution, the flow of ...
The security devil is always in the details of the attack: the ones you've endured, the ones you prepare yourself to fend off, and the ones that, you fear, will catch you completely unaware and defenseless. The Internet of Things (IoT) is nothing if not an endless proliferation of details. It's the vision of a world in which continuous Internet connectivity and addressability is embedded into a growing range of human artifacts, into the natural world, and even into our smartphones, appliances, and physical persons. In the IoT vision, every new "thing" - sensor, actuator, data source, data con...
"BSQUARE is in the business of selling software solutions for smart connected devices. It's obvious that IoT has moved from being a technology to being a fundamental part of business, and in the last 18 months people have said let's figure out how to do it and let's put some focus on it, " explained Dave Wagstaff, VP & Chief Architect, at BSQUARE Corporation, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Focused on this fast-growing market’s needs, Vitesse Semiconductor Corporation (Nasdaq: VTSS), a leading provider of IC solutions to advance "Ethernet Everywhere" in Carrier, Enterprise and Internet of Things (IoT) networks, introduced its IStaX™ software (VSC6815SDK), a robust protocol stack to simplify deployment and management of Industrial-IoT network applications such as Industrial Ethernet switching, surveillance, video distribution, LCD signage, intelligent sensors, and metering equipment. Leveraging technologies proven in the Carrier and Enterprise markets, IStaX is designed to work ac...
"There is a natural synchronization between the business models, the IoT is there to support ,” explained Brendan O'Brien, Co-founder and Chief Architect of Aria Systems, in this SYS-CON.tv interview at the 15th International Cloud Expo®, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
C-Labs LLC, a leading provider of remote and mobile access for the Internet of Things (IoT), announced the appointment of John Traynor to the position of chief operating officer. Previously a strategic advisor to the firm, Mr. Traynor will now oversee sales, marketing, finance, and operations. Mr. Traynor is based out of the C-Labs office in Redmond, Washington. He reports to Chris Muench, Chief Executive Officer. Mr. Traynor brings valuable business leadership and technology industry expertise to C-Labs. With over 30 years' experience in the high-tech sector, John Traynor has held numerous...
Bit6 today issued a challenge to the technology community implementing Web Real Time Communication (WebRTC). To leap beyond WebRTC’s significant limitations and fully leverage its underlying value to accelerate innovation, application developers need to consider the entire communications ecosystem.
The 3rd International @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades.
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what the future may hold. Mike Kavis is Vice President & Principal Cloud Architect at Cloud Technology Pa...
Cloud Expo 2014 TV commercials will feature @ThingsExpo, which was launched in June, 2014 at New York City's Javits Center as the largest 'Internet of Things' event in the world. The next @ThingsExpo will take place November 4-6, 2014, at the Santa Clara Convention Center, in Santa Clara, California. Since its launch in 2008, Cloud Expo TV commercials have been aired and CNBC, Fox News Network, and Bloomberg TV. Please enjoy our 2014 commercial.