mgcv
package
Don't think about it too hard...😉
Y
, using another, X
y=α+βx+ϵ
y=2+1.5x+ϵ
More complicated mathsy definitions than I can explain, first lets consider powers / orders:
Squared ( x2 or x∗x)
Cubed ( x3 or x∗x∗x)
If we use these in regression, we can get something like:
y=α+β1x+β2x2+β3x3...+βnxzn
More complicated mathsy definitions than I can explain, first lets consider powers / orders:
Squared ( x2 or x∗x)
Cubed ( x3 or x∗x∗x)
If we use these in regression, we can get something like:
y=α+β1x+β2x2+β3x3...+βnxzn
Dodgy fit with increased complexity
Can oscillate wildly, particularly at edges:
Within a model, how many 'parts' are free to vary?
E.g. If we have 3 numbers and we know the average is 5:
If we have 2 and 7, our final number is not free to vary. It must be 6: 2+7+63=5 This means our 'model' is constrained to n−1 degrees of freedom
Within a model, how many 'parts' are free to vary?
E.g. If we have 3 numbers and we know the average is 5:
If we have 2 and 7, our final number is not free to vary. It must be 6: 2+7+63=5 This means our 'model' is constrained to n−1 degrees of freedom
N−k−1 Helpful post on CrossValidated
When our model fits both the underlying relationship and the 'noise' percuiliar to our sample data
You want to fit the relationship, whilst minimising the noise.
This helps 'generalizability': meaning it will predict well on new data.
If we allowed total freedom in our model, e.g. a knot at every data point. What would happen?
Here we will visualise the relationship, view it as a linear regression, and attempt a polynomial fit.
If we could define a set of functions we are happy with:
Figure taken from Noam Ross' GAMs in R course, CC-BY, https://github.com/noamross/gams-in-r-course
Thin flexible strip that bends to curves
Held in place by weights ("ducks"), nails etc.
The tension element is important: spline flexes minimally
Can be controlled by number of knots (k), or by a penalty γ.
We will now use a spline function to trace the relationship in our data, through a regression, and plot it.
y=α+f(x)+ϵ
y=α+f(x)+ϵ
Or more formally, an example GAM might be (Wood, 2017):
g(μi)=Aiθ+f1(x1)+f2(x2i)+f3(x3i,x4i)+...
Where:
Can build regression models with smoothers, particularly suited to non-linear, or noisy data
Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version
Can build regression models with smoothers, particularly suited to non-linear, or noisy data
Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version
Note:
df(λ)=k, when λ=0
df(λ)→0, when λ→∞
R
distribution, used in ggplot2
geom_smooth
etc.R
distribution, used in ggplot2
geom_smooth
etc.library(mgcv)my_gam <- gam(Y ~ s(X, bs="cr"), data=dt)
s()
controls smoothers (and other options, t
, ti
)bs="cr"
telling it to use cubic regression spline ('basis')k=10
)REML
)summary(my_gam)
## ## Family: gaussian ## Link function: identity ## ## Formula:## Y ~ s(X, bs = "cr")## ## Parametric coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 43.9659 0.8305 52.94 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Approximate significance of smooth terms:## edf Ref.df F p-value ## s(X) 6.087 7.143 296.3 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## R-sq.(adj) = 0.876 Deviance explained = 87.9%## GCV = 211.94 Scale est. = 206.93 n = 300
gam.check(my_gam)
## ## Method: GCV Optimizer: magic## Smoothing parameter selection converged after 4 iterations.## The RMS GCV score gradient at convergence was 1.107369e-05 .## The Hessian was positive definite.## Model rank = 10 / 10 ## ## Basis dimension (k) checking results. Low p-value (k-index<1) may## indicate that k is too low, especially if edf is close to k'.## ## k' edf k-index p-value## s(X) 9.00 6.09 1.1 0.97
gam.check(my_gam)
## ## Method: GCV Optimizer: magic## Smoothing parameter selection converged after 4 iterations.## The RMS GCV score gradient at convergence was 1.107369e-05 .## The Hessian was positive definite.## Model rank = 10 / 10 ## ## Basis dimension (k) checking results. Low p-value (k-index<1) may## indicate that k is too low, especially if edf is close to k'.## ## k' edf k-index p-value## s(X) 9.00 6.09 1.1 0.93
my_lm <- lm(Y ~ X, data=dt)anova(my_lm, my_gam)
## Analysis of Variance Table## ## Model 1: Y ~ X## Model 2: Y ~ s(X, bs = "cr")## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 298.00 88154 ## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC## my_lm 3.000000 2562.280## my_gam 8.087281 2460.085
my_lm <- lm(Y ~ X, data=dt)anova(my_lm, my_gam)
## Analysis of Variance Table## ## Model 1: Y ~ X## Model 2: Y ~ s(X, bs = "cr")## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 298.00 88154 ## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC## my_lm 3.000000 2562.280## my_gam 8.087281 2460.085
We will now fit the same relationship with a GAM using the mgcv
package.
We will now look at varying the fit using things, like the degrees of freedom and penalty, that affect the smoothers (splines). We will also visualise these changes and effects on models.
The name of the model class is 'Generalized Additive Model' (GAM) - like the 'Generalized Linear Model' (GLM)
The main addition here is the family
argument:
my_logistic_gam <- gam(Y ~ s(X, bs="cr"), data=dt, family = "binomial")
The name of the model class is 'Generalized Additive Model' (GAM) - like the 'Generalized Linear Model' (GLM)
The main addition here is the family
argument:
my_logistic_gam <- gam(Y ~ s(X, bs="cr"), data=dt, family = "binomial")
As with a GLM, the results are on the scale of the link function, so have to transform predictions back to the response scale:
predict(my_logistic_gam, newdata, type="response")
Sometimes our predictors are not independent of each other, and you may want to isolate the individual and combined effects.
Sometimes our predictors are not independent of each other, and you may want to isolate the individual and combined effects.
In a GLM, you would add this with a multiplicative interaction term:
my_interaction_model <- glm(Y ~ X * Z , data=dt, family = "binomial")
Sometimes our predictors are not independent of each other, and you may want to isolate the individual and combined effects.
In a GLM, you would add this with a multiplicative interaction term:
my_interaction_model <- glm(Y ~ X * Z , data=dt, family = "binomial")
You can't do the same thing in an additive model, but you can make multidimensional smoothers:
# Within 's' - assuming they are on the same scalemy_interaction_gam1 <- gam(Y ~ s(X,Z), data=dt, family = "binomial")# As a tensor product, using te, if on different scalesmy_interaction_gam2 <- gam(Y ~ te(X,Z), data=dt, family = "binomial")
We will now generalise to include more than one predictor / smooth function, how they might be combined, and effects on models. We will also progress on to a generalised linear model, using a distribution family
We will now apply what we've learnt to the Framingham cohort study, predicting the binary variable: 'TenYearCHD' using other columns. See how you can use smoothers on the continuous variables to get the best fit possible.
Hint: you will need to use AIC or AUC/ROC to compare models, not R2.
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
GAMs are a framework for regressions using smoothers
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
GAMs are a framework for regressions using smoothers
mgcv
is a great package for GAMs with various smoothers available
mgcv
estimates the required smoothing penalty for you
gratia
or mgcViz
packages are good visualization tool for GAMs
HARRELL, F. E., JR. 2001. Regression Modeling Strategies, Springer-Verlag.
HASTIE, T. & TIBSHIRANI, R. 1986. Generalized Additive Models. Statistical Science, 1, 297-310. 291
HASTIE, T., TIBSHIRANI, R. & FRIEDMAN, J. 2009. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, Springer.
Y
, using another, X
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |