GAMworkshop

.pull-left[

# Modelling non-linear data with Generalized Additive Models (GAMs)
### Using the `mgcv` package

.pull-right[

<p style="text-align:center;font-weight:bold;"><img src="man/figures/download.png" style="height:550px;" alt="Picture of Michael Jackson's dance routine for the song 'Smooth Criminal', with the bad "dad-joke": 'You've been hit by a smoothed residual'">
<br>
Don't think about it too hard...😉  </p>

]

---

# Regression models on non-linear data

+ Regression is a method for predicting a variable, `Y`, using another, `X`

<img src="GAMworkshop_files/figure-html/regression1-1.png" alt="Two-dimensional scatterplot with a range of data points that show the two dimensions are correlated" width="864" style="display: block; margin: auto;" />
---

# Equation of a straight line (1)

`$$y= \alpha + \beta x + \epsilon$$`

---

# Equation of a straight line (2)

`$$y= 2 + 1.5 x + \epsilon$$`

---

# What about nonlinear data? (1)

---

# What about nonlinear data? (2)

<img src="GAMworkshop_files/figure-html/cats-1.png" alt="The same scatter plot from the first slide with a straight line of best fit." width="864" style="display: block; margin: auto;" />
---

# What about nonlinear data? (3)

<img src="GAMworkshop_files/figure-html/cats3-1.png" alt="The same scatter plot from the first slide with boxes for 3 categories added." width="864" style="display: block; margin: auto;" />
---

# What about nonlinear data? (4)

<img src="GAMworkshop_files/figure-html/cats4-1.png" alt="The same scatter plot from the first slide with with a smooth sigmoidal shape fitted to the points" width="864" style="display: block; margin: auto;" />
---

# What about nonlinear data? (5)

<img src="GAMworkshop_files/figure-html/cats5-1.png" alt="The same scatter plot from the first slide with a straight line, categorical and the a smooth sigmoidal shapes fitted to the points." width="864" style="display: block; margin: auto;" />
---

.pull-left[

# Poly - what?

+ More complicated mathsy definitions than I can explain, first lets consider powers / orders:

+ Squared ( `$x^2$` or `$x * x$`)

+ Cubed ( `$x^3$` or `$x * x * x$`)

If we use these in regression, we can get something like:

`$$y = \alpha + \beta_1x + \beta_2x^2 + \beta_3x^3... + \beta_nx_n^z$$`
]

.pull-right[

## Problems with polynomials

+ Dodgy fit with increased complexity

+ Can oscillate wildly, particularly at edges:
  + [Runge's phenomenon](https://en.wikipedia.org/wiki/Runge%27s_phenomenon#:~:text=In%20the%20mathematical%20field%20of,set%20of%20equispaced%20interpolation%20points.)

]

---

# Degrees of freedom (df)

Within a model, how many 'parts' are free to vary?

E.g. If we have 3 numbers and we know the average is 5:

If we have 2 and 7, our final number is not free to vary.
It must be 6:
$$ \frac{2 + 7 + 6}{3} = 5$$
This means our 'model' is constrained to `$n-1$` degrees of freedom

## In regression context:

+ The number of data points in our model ( `$N$` ) limits the df
+ Usually the number of predictors ( `$k$` ) in our model is considered the df (one of these is the intercept)
+ "Residual df" are points left to vary in the model, after considering the df:

`$$N-k-1$$`
Helpful post on [CrossValidated](https://stats.stackexchange.com/questions/340007/confused-about-residual-degree-of-freedom)

---

# Overfitting

> When our model fits both the underlying relationship and the 'noise' percuiliar to our sample data
 
 + You want to fit the relationship, whilst minimising the noise.
 
 + This helps 'generalizability': meaning it will predict well on new data.

If we allowed total freedom in our model, e.g. a knot at every data point. What would happen?

---
class: middle

# Exercise 1: Load and fit non-linear relationship
Here we will visualise the relationship, view it as a linear regression, and attempt a polynomial fit.

---

# What if we could do something more suitable?

If we could define a set of functions we are happy with:
+ We could use coefficients to 'support' the fit
+ Could use penalties to restrict how much they flex

.smaller[
Figure taken from Noam Ross' GAMs in R course, CC-BY, https://github.com/noamross/gams-in-r-course
]

---

# Splines

+ How do you (manually) draw a smooth curve?

.pull-left[

### [Draftsman's spline](https://www.core77.com/posts/55368/When-Splines-Were-Physical-Objects) / ['flat spline'](https://en.wikipedia.org/wiki/Flat_spline)

+ Thin flexible strip that bends to curves

+ Held in place by weights ("ducks"), nails etc.

+ The tension element is important: spline flexes minimally

]

.pull-right[

]

---

# Mathematical splines

+ Smooth, piece-wise polynomials, like a flexible strip for drawing curves.
+ 'Knot points' between each section

---

# How smooth?

Can be controlled by number of knots `$(k)$`, or by a penalty `$\gamma$`.

.pull-left[
<img src="GAMworkshop_files/figure-html/knots2-1.png" alt="Scatter plot from earlier with a wiggly sigmoidal best fit lines for 2, 20 and 50 points" width="432" style="display: block; margin: auto;" />
]

.pull-right[
<img src="GAMworkshop_files/figure-html/penalty-1.png" alt="Scatter plot from earlier with a wiggly sigmoidal best fit lines for 20 knots but differing penalties 0.001, 1 and 10" width="432" style="display: block; margin: auto;" />

]

---

# Generalized Additive Model

+ Regression models where we fit smoothers (like splines) from our data.
+ Strictly additive, but smoothers can describe complex relationships.
+ In our case:

`$$y= \alpha + f(x) + \epsilon$$`
--

<br>
.smaller[
Or more formally, an example GAM might be (Wood, 2017):
]

`$$g(\mu_i) = A_i \theta + f_1(x_1) + f_2(x_{2i}) + f3(x_{3i}, x_{4i}) + ...$$`
<br>
Where:

.smaller[
+ `$\mu_i \equiv \mathbb{E}(Y _i)$`, the expectation of Y]

.smaller[
+ `$Yi \sim EF(\mu _i, \phi _i)$`, `$Yi$` is a response variable, distributed according to exponential family distribution with mean `$\mu _i$` and shape parameter `$\phi$`.]

.smaller[
+ `$A_i$` is a row of the model matrix for any strictly parametric model components with `$\theta$` the corresponding parameter vector.]

.smaller[
+ `$f_i$` are smooth functions of the covariates, `$xk$`, where `$k$` is each function basis.]

---
# What does that mean for me?

+ Can build regression models with smoothers, particularly suited to non-linear, or noisy data

+ _Hastie (1985)_ used knot every point, _Wood (2017)_ uses reduced-rank version

## Issues

+ We need to chose the right _dimension_ (degrees of freedom / knots) for our smoothers
+ We need to chose the right penalty ( `$\lambda$` ) for our smoothers

### Consequence

+ If you penalise a smooth of `$k$` dimensions, it no longer has `$k-1$` degrees of freedom as they are reduced
+ 'Effective degrees of freedom' - the penalized df of the predictors in the model.

__Note:__
<br>
`$df(\lambda) = k$`, when `$\lambda = 0$` 
<br>
`$df(\lambda) \rightarrow 0$`, when `$\lambda \rightarrow \infty$`

---

# mgcv: mixed gam computation vehicle

+ Prof. Simon Wood's package, pretty much the standard
+ Included in standard `R` distribution, used in `ggplot2` `geom_smooth` etc.
+ Has sensible defaults for dimensions
+ Estimates the ideal penalty for smooths by various methods, with REML recommended.

```r
library(mgcv)

my_gam <- gam(Y ~ s(X, bs="cr"), data=dt)
```

+ `s()` controls smoothers (and other options, `t`, `ti`)
+ `bs="cr"` telling it to use cubic regression spline ('basis')
+ Default determined from data, but you can alter this e.g. (`k=10`)
+ Penalty (smoothing parameter) estimation method is set to (`REML`)
---

# Model Output:

```r
summary(my_gam)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Y ~ s(X, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  43.9659     0.8305   52.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##        edf Ref.df     F p-value    
## s(X) 6.087  7.143 296.3  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.876   Deviance explained = 87.9%
## GCV = 211.94  Scale est. = 206.93    n = 300
```

---
# Check your model:

```r
gam.check(my_gam)
```

```
## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 4 iterations.
## The RMS GCV score gradient at convergence was 1.107369e-05 .
## The Hessian was positive definite.
## Model rank =  10 / 10 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##        k'  edf k-index p-value
## s(X) 9.00 6.09     1.1    0.97
```

---
# Check your model:

```r
gam.check(my_gam)
```

---

# Is it any better than linear model?

```r
my_lm <- lm(Y ~ X, data=dt)

anova(my_lm, my_gam)
```

```
## Analysis of Variance Table
## 
## Model 1: Y ~ X
## Model 2: Y ~ s(X, bs = "cr")
##   Res.Df   RSS     Df Sum of Sq      F    Pr(>F)    
## 1 298.00 88154                                      
## 2 292.91 60613 5.0873     27540 26.161 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
AIC(my_lm, my_gam)
```

```
##              df      AIC
## my_lm  3.000000 2562.280
## my_gam 8.087281 2460.085
```

## Yes, yes it is!

---
class: middle

# Exercise 2: Simple GAM fit
We will now fit the same relationship with a GAM using the  `mgcv` package.

---
class: middle

# Exercise 3: GAM fitting options
We will now look at varying the fit using things like the degrees of freedom and penalty.
We will also visualise these changes and effects on models.

---
class: middle

# Exercise 4: Multivariable GAMs
We will now generalise to include more than one predictor / smooth function, how they might be combined, and effects on models.  We will also progress on to a generalised linear model, using a distribution family

---
class: middle

# Break

---
class: middle

# Exercise 5: Put it all together!
We will now apply what we've learnt to the Framingham cohort study, predicting the binary variable: 'TenYearCHD' using other columns.  See how you can use smoothers on the continuous variables to get the best fit possible.
___Hint:___ you will need to use AIC or auc to compare models, not R2.

---

# Summary

+ Regression models are concerned with explaining one variable: `y`, with another: `x`

+ This relationship is assumed to be linear

+ If your data are not linear, or noisy, a smoother might be appropriate

+ Splines are ideal smoothers, and are polynomials joined at 'knot' points

+ GAMs are a framework for regressions using smoothers

+ `mgcv` is a great package for GAMs with various smoothers available

+ `mgcv` estimates the required smoothing penalty for you

+ `gratia` or `mgcViz` packages are good visualization tool for GAMs

---
# References and Further reading:

#### GitHub code: 
https://github.com/chrismainey/GAMworkshop

#### Simon Wood's comprehensive book:
+ WOOD, S. N. 2017. Generalized Additive Models: An Introduction with R, Second Edition, Florida, USA, CRC Press.

#### Noam Ross free online GAM course:
https://noamross.github.io/gams-in-r-course/

<br>

+ HARRELL, F. E., JR. 2001. Regression Modeling Strategies, New York, Springer-Verlag New York.

+ HASTIE, T. & TIBSHIRANI, R. 1986. Generalized Additive Models. Statistical Science, 1, 297-310. 291

+ HASTIE, T., TIBSHIRANI, R. & FRIEDMAN, J. 2009. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, New York, NETHERLANDS, Springer.