anthe.sevenants

How to retrieve the formula from a step regression in R

2022-05-18

You can use step regressions to check which predictors of your regression are important. For example, if you have a regression object lm, you can call step on it to check which predictors are significant:

steplm <- step(lm)

Because step is nice to us, it also provides a ready-made formula we can then plug into our original regression, minus the non-significant parameters:

[...]
Model found:
score ~ bound + foc_win + foc_pos + foc_pmi + soc_length + soc_pos

There are two ways to further use this formula:

  1. We output the step object to the R console, then manually copy the formula into a regression (not cool)
  2. We retrieve the formula from the step object and pass it on to another regression automatically (very cool)

I will explain how to retrieve the formula from the step object and given an example of how to plug it into another regression.

You will read online that you only need to call formula on your step regression object to retrieve the regression formula. This, however, did not work for me, neither did any of the alternative options. Therefore, I looked into the source code of the step function and found that the following command will work:

formula(attr(steplm, "model"))
score ~ bound + foc_win + foc_pos + foc_pmi + soc_length + soc_pos

So, if you store the formula in an object, you can then pass it onto another regression, and create a chain of regression analyses. Great!

steplm_formula <- formula(attr(steplm, "model"))
lm(steplm_formula, data=df)

Now you can always select only significant parameters when running your regressions.