Don't think about it too hard...😉
Y
, using another, X
y=α+βx+ϵ
y=2+1.5x+ϵ
More complicated mathsy definitions than I can explain, first lets consider powers / orders:
Squared ( x2 or x∗x)
Cubed ( x3 or x∗x∗x)
If we use these in regression, we can get something like:
y=a0+a1x+a2x2+...+anxn
More complicated mathsy definitions than I can explain, first lets consider powers / orders:
Squared ( x2 or x∗x)
Cubed ( x3 or x∗x∗x)
If we use these in regression, we can get something like:
y=a0+a1x+a2x2+...+anxn
Dodgy fit with increased complexity
Can oscillate wildly, particularly at edges:
If we could define a set of functions we are happy with:
Figure taken from Noam Ross' GAMs in R course, CC-BY, https://github.com/noamross/gams-in-r-course
Thin flexible strip that bends to curves
Held in place by weights ("ducks"), nails etc.
The tension element is important: spline flexes minimally
Can be controlled by number of knots (k), or by a penalty γ.
y=α+f(x)+ϵ
y=α+f(x)+ϵ
Or more formally, an example GAM might be (Wood, 2017):
g(μi)=Aiθ+f1(x1)+f2(x2i)+f3(x3i,x4i)+...
Where:
Can build regression models with smoothers, particularly suited to non-linear, or noisy data
Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version
Can build regression models with smoothers, particularly suited to non-linear, or noisy data
Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version
R
distribution, used in ggplot2
geom_smooth
etc.Can build regression models with smoothers, particularly suited to non-linear, or noisy data
Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version
R
distribution, used in ggplot2
geom_smooth
etc.library(mgcv)my_gam <- gam(Y ~ s(X, bs="cr"), data=dt)
s()
controls smoothers (and other options, t
, ti
)bs="cr"
telling it to use cubic regression spline ('basis')k=10
)summary(my_gam)
## ## Family: gaussian ## Link function: identity ## ## Formula:## Y ~ s(X, bs = "cr")## ## Parametric coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 43.9659 0.8305 52.94 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Approximate significance of smooth terms:## edf Ref.df F p-value ## s(X) 6.087 7.143 296.3 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## R-sq.(adj) = 0.876 Deviance explained = 87.9%## GCV = 211.94 Scale est. = 206.93 n = 300
gam.check(my_gam)
## ## Method: GCV Optimizer: magic## Smoothing parameter selection converged after 4 iterations.## The RMS GCV score gradient at convergence was 1.107369e-05 .## The Hessian was positive definite.## Model rank = 10 / 10 ## ## Basis dimension (k) checking results. Low p-value (k-index<1) may## indicate that k is too low, especially if edf is close to k'.## ## k' edf k-index p-value## s(X) 9.00 6.09 1.1 0.97
gam.check(my_gam)
## ## Method: GCV Optimizer: magic## Smoothing parameter selection converged after 4 iterations.## The RMS GCV score gradient at convergence was 1.107369e-05 .## The Hessian was positive definite.## Model rank = 10 / 10 ## ## Basis dimension (k) checking results. Low p-value (k-index<1) may## indicate that k is too low, especially if edf is close to k'.## ## k' edf k-index p-value## s(X) 9.00 6.09 1.1 0.93
my_lm <- lm(Y ~ X, data=dt)anova(my_lm, my_gam)
## Analysis of Variance Table## ## Model 1: Y ~ X## Model 2: Y ~ s(X, bs = "cr")## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 298.00 88154 ## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC## my_lm 3.000000 2562.280## my_gam 8.087281 2460.085
my_lm <- lm(Y ~ X, data=dt)anova(my_lm, my_gam)
## Analysis of Variance Table## ## Model 1: Y ~ X## Model 2: Y ~ s(X, bs = "cr")## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 298.00 88154 ## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC## my_lm 3.000000 2562.280## my_gam 8.087281 2460.085
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
GAMs are a framework for regressions using smoothers
Regression models are concerned with explaining one variable: y
, with another: x
This relationship is assumed to be linear
If your data are not linear, or noisy, a smoother might be appropriate
Splines are ideal smoothers, and are polynomials joined at 'knot' points
GAMs are a framework for regressions using smoothers
mgcv
is a great package for GAMs with various smoothers available
mgcv
estimates the required smoothing penalty for you
gratia
or mgcViz
packages are good visualization tool for GAMs
https://github.com/chrismainey/fitting_wiggly_data
https://noamross.github.io/gams-in-r-course/
HARRELL, F. E., JR. 2001. Regression Modeling Strategies, New York, Springer-Verlag New York.
HASTIE, T. & TIBSHIRANI, R. 1986. Generalized Additive Models. Statistical Science, 1, 297-310. 291
HASTIE, T., TIBSHIRANI, R. & FRIEDMAN, J. 2009. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, New York, NETHERLANDS, Springer.
Y
, using another, X
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |