Stepwise Regression
Understanding and Using Stepwise Regression to get stuff done. Probably the most detailed post I've created thus far. This will be the new standard going forward.
You can find the code in the GitHub here.
Table of Contents
Stepwise regression (Actionable knowledge)
Linear Regression model
Complexity vs Performance Issue
Stepwise regression (Actionable Knowledge)
Introduction
From the last 2 posts on Linear Regression, we focused on how to transform data in order to squeeze as much value out of it as possible. In this one, we will focus more on the actual variable selection side of things. That is, we will focus more on the actual model building side, and not so much on tweaking the predictor variables, and the response variable. Stepwise regression is still working with a linear equation though, so what you learned from the linear regression model posts still applies here.
The basic idea of stepwise regression is this:
We have our independent variables, and we have our dependent variable.
What if instead of just doing linear regression once, on all of the data set. What if we did it several times over and over again, and adjusted the independent variables used, based upon the feedback that we get.
In other words, what if we start off with no data, run linear regression on each of the independent variables once, and then based off the feedback figure out which one to add.
Then rinse and repeat until the model is solid.
TLDR is this, what if instead of taking all of the independent variables, we find the ones which are relevant, and only use the relevant variables in our regression analysis?
Well, that is basically what stepwise regression aims to do. But, there are actually 3 different ways of implementing Stepwise regression. We will go over all 3, and how to set them up in both R, and Python.
Keep reading with a 7-day free trial
Subscribe to Data Science & Machine Learning 101 to keep reading this post and get 7 days of free access to the full post archives.