Notes on GAMs in R with a binary dependent variables

A few weeks ago I was working on a project using GAMs to estimate the gradients of the marginal effects of the likelihood of a customer to convert. This was a very cool problem, and I learned a lot, but it took me quite a bit of time to figure out how to get the built-in functions to generate outputs that are interpretable in a regression with a binary dependent variable (e.g., logistic-style regression). Below I’ve included a slightly-cleaned-up version of my notes from my research in case they’re helpful to any of y’all!

If you’re not sure what GAMs are, I’d check out these resources for a good overview:


 

Model Validation

Use gam.check() There’s two big things to check:

  • Does the model converge at all? If not, your results aren’t valid.
  • Check the p-values for the smooths at the bottom — if any of these are significant you should increase the smoothe values (k= in the model specification) — remember the k is a ceiling on the value so erring too high won’t impact your results, but erring too low will!
  • If you’re using a binary model, the QQ plot and other automatically generated plots aren’t useful. However, uou can use DHARMa to get interpretable fit checks

Logistic GAMs

  • Use the `family = “binomial”` option to estimate a logistic-style GAM model
  • Like a logistic regression, the output is on the log scale — we can convert to the probability scale using the plogis() function
    • When plotting, we can convert our output to the probability scale by using the trans argument. The trans argument takes a function that transforms the output. So we can pass the plogis() logistic function to this argument, and all values in plots will be transformed from log-odds to probabilities. plot(binom_mod, pages = 1, trans = plogis)(Source)
  • To get appropriately-scaled marginal effects plots, we can use the following specification: plot(binom_mod, pages = 1, trans = plogis, shift = coef(binom_mod)[1], seWithMean = TRUE)
    • Now our partial effect plots have much more intuitive interpretation. Each partial effect plot can be interpreted as showing the probability of the outcome if all other variables were at their average value. At their own average value, you get only the effect of the intercept. (Source)

The different GAM smoothing options

You can select s(), ti(), or te() for a smooth. ti() and te() work like interaction terms, but they’re crucially different:

  • ti() only works with variables that are on the same scale. It’s commonly used for something like latitude and longitude. (Sometimes the documentation refers to these variables as being “isotropic”)
  • te() works with variables are not on the same scale (they’re “anisotropic”)
  • You have to choose the number of basis functions to use for each smooth, using the k argument of s or te. (Source])
    • Given the penalization process, the exact choice of k isn’t too big of a deal, and can be seen as an upper limit to the flexibility of the term. The actual flexibility is determined via the penalization process. (Source)

Plotting interaction effects

  • In mgcv’s plot() command, interactions are represented with contour plots. A single plot represents both variables and their interaction. In this plot the axes represent values of our predictor variables, x1 and x2. The interior is a topographic map of predicted values. The contour lines represent points of equal predicted values, and they are labeled. The dotted lines show uncertainty in prediction; they represent how contour lines would move if predictions were one standard error higher or lower. (Source)
  • The plots generated by mgcv’s plot() function are partial effect plots. That is, they show the component effect of each of the smooth or linear terms in the model, which add up to the overall prediction. (Source)
  • It’s often useful to plot the standard errors of a partial effect term combined with the standard errors of the model intercept. This is because confidence intervals at the mean value of a variable can be very tiny, and don’t reflect overall uncertainty in our model. Using the seWithMean argument adds in this uncertainty. (Source)
    • To make the plots even more interpretable, it’s useful to shift the scale so that the intercept is included. Using the shift argument, we can shift the scale by value of the intercept, which is the first coefficient of the model. Note how the y-axis has changed. Now, the partial effect plot has a more natural interpretation – it shows us the prediction of the output, assuming other variables are at their average value. For instance, this plot shows that the miles per gallon of a 2000 pound car is about 30, all else being equal. (Source)

References

About the author

michael
By michael