Random Intercept models

Introduction

This is a surface-level overview of fitting random-intercept models in R and Python and trying to use them for the prediction task described in our chat on 19/01/2022. Hopefully it’s a useful starter!

For prediction, I will demonstrate two methods for a random-intercept:

“Conditional” - ‘conditioned’ on the random-effect i.e. using the random effect. This gives ‘cluster-specific’ predictions and in SHMI this would be trust-specific predictions. You can’t use these for a funnel plot, as there is no residual variation and all points line up at 1 on the y-axis (you are summing residual variance in the same clusters you are calculating it at).
“Marginal” - using the global average prediction i.e. without the random effect. This gives a global prediction and in SHMI this would be prediction at the national average risk for a patient with a set of predictors (not trust-specific). You can use these for a funnel plot, you’ve just got a better case mix model.

Although I advocate the marginal prediction, another approach entirely would be to estimate the random-intercept (how much the trust differs from national average), bootstrap a confidence interval and present as a caterpillar plot. That’s another argument though

Data

I’m Hilbe’s COUNT package and the medpar dataset which is a cut from 1991 Medicare files for the state of Arizona.

library(COUNT)

## Loading required package: msme

## Loading required package: MASS

## Loading required package: lattice

## Loading required package: sandwich

library(lme4)

## Loading required package: Matrix

library(ModelMetrics)

## 
## Attaching package: 'ModelMetrics'

## The following object is masked from 'package:base':
## 
##     kappa

library(ggplot2)
library(FunnelPlotR)

data("medpar")

In R

This is using the lme4 library which is a frequentest take on multi-level modelling, but it can generally be interpreted in a Bayesian fashion as well, and many mixed-effects model packages are explicitly Bayesian.

Single-level glm:

mod1 <- glm(died ~ age80 + los + factor(type), data=medpar, family="binomial")

summary(mod1)

## 
## Call:
## glm(formula = died ~ age80 + los + factor(type), family = "binomial", 
##     data = medpar)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5312  -0.8830  -0.8032   1.2938   2.2568  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.590949   0.097351  -6.070 1.28e-09 ***
## age80          0.656493   0.129180   5.082 3.73e-07 ***
## los           -0.037483   0.007871  -4.762 1.92e-06 ***
## factor(type)2  0.418704   0.144611   2.895  0.00379 ** 
## factor(type)3  0.961028   0.230489   4.170 3.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1922.9  on 1494  degrees of freedom
## Residual deviance: 1857.8  on 1490  degrees of freedom
## AIC: 1867.8
## 
## Number of Fisher Scoring iterations: 4

auc(mod1)

## [1] 0.6372224