Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide






Fitting Wiggly Data

Why mgcv is awesome





c.mainey1@nhs.net

mainard.co.uk

chrismainey

chrismainey

Picture of Michael Jackson's dance routine for the song 'Smooth Criminal', with the bad
Don't think about it too hard...😉

QR code https://chrismainey.github.io/fitting_wiggly_data/fitting_wiggly_data.html

1

Regression models on non-linear data

  • Regression is a method for predicting a variable, Y, using another, X

Two-dimensional scatterplot with a range of data points that show the two dimensions are correlated

2

Equation of a straight line (1)

y=α+βx+ϵ

The same scatter plot from the last slide now has a line of best fit drawn, highlighting where it crosses the vertical axis called the intercept or alpha. For each unit of 1 on the horizontal axis, the vertical axis raises by and amount called a coefficient, or beta.

3

Equation of a straight line (2)

y=2+1.5x+ϵ

The same scatter plot from the last slide now has a line of best fit drawn, highlighting where it crosses the vertical axis called the intercept or alpha. For each unit of 1 on the horizontal axis, the vertical axis raises by and amount called a coefficient, or beta.

4

What about nonlinear data? (1)

The same scatter plot from the first slide with no line of best fit

5

What about nonlinear data? (2)

The same scatter plot from the first slide with a straight line of best fit.

6

What about nonlinear data? (3)

The same scatter plot from the first slide with boxes for 3 categories added.

7

What about nonlinear data? (4)

The same scatter plot from the first slide with with a smooth sigmoidal shape fitted to the points

8

What about nonlinear data? (5)

The same scatter plot from the first slide with a straight line, categorical and the a smooth sigmoidal shapes fitted to the points.

9

Poly - what?

  • More complicated mathsy definitions than I can explain, first lets consider powers / orders:

    • Squared ( x2 or xx)

    • Cubed ( x3 or xxx)



If we use these in regression, we can get something like:

y=a0+a1x+a2x2+...+anxn

10

Poly - what?

  • More complicated mathsy definitions than I can explain, first lets consider powers / orders:

    • Squared ( x2 or xx)

    • Cubed ( x3 or xxx)



If we use these in regression, we can get something like:

y=a0+a1x+a2x2+...+anxn

Problems with polynomials

  • Dodgy fit with increased complexity

  • Can oscillate wildly, particularly at edges:

Runge phenomenon

10

What if we could do something more suitable?

If we could define a set of functions we are happy with:

  • We could use coefficients to 'support' the fit
  • Could use penalties to restrict how much they flex

Example of basis functions for fitting data

Figure taken from Noam Ross' GAMs in R course, CC-BY, https://github.com/noamross/gams-in-r-course

11

Splines

  • How do you (manually) draw a smooth curve?
12

Splines

  • How do you (manually) draw a smooth curve?

Draftsman's spline / 'flat spline'

  • Thin flexible strip that bends to curves

  • Held in place by weights ("ducks"), nails etc.

  • The tension element is important: spline flexes minimally

Spline (PSF)

12

Mathematical splines

  • Smooth, piece-wise polynomials, like a flexible strip for drawing curves.
  • 'Knot points' between each section

Scatter plot from earlier with a wiggly sigmoidal best fit line

13

How smooth?

Can be controlled by number of knots (k), or by a penalty γ.

Scatter plot from earlier with a wiggly sigmoidal best fit lines for 2, 20 and 50 points

Scatter plot from earlier with a wiggly sigmoidal best fit lines for 20 knots but differing penalties 0.001, 1 and 10

14

Generalized Additive Model

  • Regression models where we fit smoothers (like splines) from our data.
  • Strictly additive, but smoothers can describe complex relationships.
  • In our case:

y=α+f(x)+ϵ

15

Generalized Additive Model

  • Regression models where we fit smoothers (like splines) from our data.
  • Strictly additive, but smoothers can describe complex relationships.
  • In our case:

y=α+f(x)+ϵ

Or more formally, an example GAM might be (Wood, 2017):

g(μi)=Aiθ+f1(x1)+f2(x2i)+f3(x3i,x4i)+...
Where:

  • μiE(Yi), the expectation of Y
  • YiEF(μi,ϕi), Yi is a response variable, distributed according to exponential family distribution with mean μi and shape parameter ϕ.
  • Ai is a row of the model matrix for any strictly parametric model components with θ the corresponding parameter vector.
  • fi are smooth functions of the covariates, xk, where k is each function basis.
15

What does that mean for me?

  • Can build regression models with smoothers, particularly suited to non-linear, or noisy data

  • Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version

16

What does that mean for me?

  • Can build regression models with smoothers, particularly suited to non-linear, or noisy data

  • Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version


mgcv: mixed gam computation vehicle

  • Prof. Simon Wood's package, pretty much the standard
  • Included in standard R distribution, used in ggplot2 geom_smooth etc.
16

What does that mean for me?

  • Can build regression models with smoothers, particularly suited to non-linear, or noisy data

  • Hastie (1985) used knot every point, Wood (2017) uses reduced-rank version


mgcv: mixed gam computation vehicle

  • Prof. Simon Wood's package, pretty much the standard
  • Included in standard R distribution, used in ggplot2 geom_smooth etc.
library(mgcv)
my_gam <- gam(Y ~ s(X, bs="cr"), data=dt)
  • s() controls smoothers (and other options, t, ti)
  • bs="cr" telling it to use cubic regression spline ('basis')
  • Default determined from data, but you can alter this e.g. (k=10)
16

Model Output:

summary(my_gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## Y ~ s(X, bs = "cr")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.9659 0.8305 52.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(X) 6.087 7.143 296.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.876 Deviance explained = 87.9%
## GCV = 211.94 Scale est. = 206.93 n = 300
17

Check your model:

gam.check(my_gam)

##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 4 iterations.
## The RMS GCV score gradient at convergence was 1.107369e-05 .
## The Hessian was positive definite.
## Model rank = 10 / 10
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(X) 9.00 6.09 1.1 0.97
18

Check your model:

gam.check(my_gam)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 4 iterations.
## The RMS GCV score gradient at convergence was 1.107369e-05 .
## The Hessian was positive definite.
## Model rank = 10 / 10
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(X) 9.00 6.09 1.1 0.93
19

Is it any better than linear model?

my_lm <- lm(Y ~ X, data=dt)
anova(my_lm, my_gam)
## Analysis of Variance Table
##
## Model 1: Y ~ X
## Model 2: Y ~ s(X, bs = "cr")
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 298.00 88154
## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC
## my_lm 3.000000 2562.280
## my_gam 8.087281 2460.085
20

Is it any better than linear model?

my_lm <- lm(Y ~ X, data=dt)
anova(my_lm, my_gam)
## Analysis of Variance Table
##
## Model 1: Y ~ X
## Model 2: Y ~ s(X, bs = "cr")
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 298.00 88154
## 2 292.91 60613 5.0873 27540 26.161 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(my_lm, my_gam)
## df AIC
## my_lm 3.000000 2562.280
## my_gam 8.087281 2460.085

Yes, yes it is!

20

Summary

  • Regression models are concerned with explaining one variable: y, with another: x

  • This relationship is assumed to be linear

  • If your data are not linear, or noisy, a smoother might be appropriate

21

Summary

  • Regression models are concerned with explaining one variable: y, with another: x

  • This relationship is assumed to be linear

  • If your data are not linear, or noisy, a smoother might be appropriate

  • Splines are ideal smoothers, and are polynomials joined at 'knot' points

21

Summary

  • Regression models are concerned with explaining one variable: y, with another: x

  • This relationship is assumed to be linear

  • If your data are not linear, or noisy, a smoother might be appropriate

  • Splines are ideal smoothers, and are polynomials joined at 'knot' points

  • GAMs are a framework for regressions using smoothers

21

Summary

  • Regression models are concerned with explaining one variable: y, with another: x

  • This relationship is assumed to be linear

  • If your data are not linear, or noisy, a smoother might be appropriate

  • Splines are ideal smoothers, and are polynomials joined at 'knot' points

  • GAMs are a framework for regressions using smoothers

  • mgcv is a great package for GAMs with various smoothers available

  • mgcv estimates the required smoothing penalty for you

  • gratia or mgcViz packages are good visualization tool for GAMs

21

References and Further reading:

GitHub code:

https://github.com/chrismainey/fitting_wiggly_data

Simon Wood's comprehensive book:

  • WOOD, S. N. 2017. Generalized Additive Models: An Introduction with R, Second Edition, Florida, USA, CRC Press.

Noam Ross free online GAM course:

https://noamross.github.io/gams-in-r-course/


  • HARRELL, F. E., JR. 2001. Regression Modeling Strategies, New York, Springer-Verlag New York.

  • HASTIE, T. & TIBSHIRANI, R. 1986. Generalized Additive Models. Statistical Science, 1, 297-310. 291

  • HASTIE, T., TIBSHIRANI, R. & FRIEDMAN, J. 2009. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, New York, NETHERLANDS, Springer.

22

Regression models on non-linear data

  • Regression is a method for predicting a variable, Y, using another, X

Two-dimensional scatterplot with a range of data points that show the two dimensions are correlated

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow