## Funnel plots

Funnel plots are a common tool for comparing organisations or units using proportions or standardised rates. A common use of them is for monitoring mortality at hospitals. This is an introductory post on the subject, that gives a little information about them and how they are constructed. It is deliberately light on theory, focusing on use, some of the theory is referenced for interested readers.

This post also uses a funnel plot function, for indirectly standardised ratios, that I built as part of my PhD work. The function is based on ggplot2 (Wickham 2009), and is available at https://github.com/chrismainey/FunnelPlotR, although it’s a work in progress.

There are different kinds of funnel plot, but this post focuses on the type used to compare standardised mortality and other similarly constructed indicators .

## Why do we use them?

### Rationale

How do you go about comparing organisations? We could simply look at indicator data and rank them, but that could be unfair if the conditions are different at each organisation. E.g. every hospital differs in size, the services it offers, and the patients it sees. We might expect a hospital seeing a higher proportion of elderly patients to have a higher mortality rate. Is it fair to compare it to an organisation serving a younger population who may be ‘healthier’ in general? Naively comparing organisations by ranking in league tables has been shown to be a bad idea (Goldstein and Spiegelhalter 1996; Lilford et al. 2004).

This scenario is not a million miles away from the techniques used in meta-analysis of clinical trial, where we may have trials of different sizes, with different estimates of effect, and differing variances. Some of the techniques applied to meta-analysis have been adapted for healthcare monitoring, including funnel plots and methods to adjust for overdispersion (Spiegelhalter 2005a, 2005b; Spiegelhalter et al. 2012).

### Construction

If we want to compare a standardised ratio or similar indicator, we can make a plot with the indicator on the Y-axis, and a measure of the unit size on the X-axis. This is commonly the sum of the predicted values for standardised ratios (e.g. the predicted number of cases), or the number of patients/discharges etc. Our centre line, the average value, can be surrounded by ‘control limits,’ a concept from Statistical Process Control. These limits are statistical boundaries to separate natural (‘common-cause’) variation and systematic differences (‘special-cause variation’) (Mohammed et al. 2001). This is commonly at organisational level, but could be at any aggregation.

The reason these limits resemble a funnel is due to the effects of size. The expected variation is larger when we are looking at fewer cases. For example, imagine an experiment where we toss an unbiased coin to see the expected value. If we toss that coin twice and both are ‘heads,’ our data is telling us that all coin tosses end up as ‘heads.’ This is not true, and we are making an assumption that we know would be different if we repeated it more times. The margin of error around this is high. So if we performed the same experiment 10, 100 or 1000 times, we would expect it to become 50:50, heads/tails, and the margin of error is proportionally smaller. This is also true of indicators based on counts, like funnel plots. We expect less variation between points as organisations get larger.

### Example:

# Add a 95% Poisson limit, by using the density function to get the quantile value for each 'expected'.
lkup<-data.frame(id=seq(1, max(dt$expected), 1)) lkup$Upper<-(qpois(0.975,lambda = lkup$id) - 0.025) / lkup$id

### Build plot

Now we can build a funnel plot object with standard Poisson limits, and outliers labelled. The function returns a list of the plotted data, the plotted control limit range, and the ggplot object, hence object[3] to call it.

### Overdispersion

That looks like too many outliers! There is more variation in our data than we would expect, and this is referred to as: overdispersion.

So lets check for it:
The following ratio should be 1 if our data are conforming to Poisson distribution assumption (conditional mean = variance). If it is greater than 1, we have overdispersion:

sum(mod$weights * mod$residuals^2)/mod\$df.residual
#> [1] 6.240519

This suggests the variance is 6.24 times the condition mean, and definitely overdispersed. This is a huge topic, but applying overdispersed limits using either SHMI or Spiegelhalter methods adjust for this by inflating the limits:

Given these adjustments, we now only have nine organisations showing special-cause variation. To interpret this plot properly, we would first investigate these outlying organisations before making any changes to the system/indicator. We should check for possible data quality issues, such as errors, missing model predictors, environmental factors (e.g. one organisation changing computer systems and data standards etc. during the monitoring period), but once these are examined we might suspect issues with care at the hospitals in question. They can then be investigated by local audit and casenote review.

These methods can be used for any similar indicators, e.g. standardised mortality ratios, readmissions etc.

## Summary

Funnel plots are useful ways to visualise indicators such as mortality, readmission and length of stay data at hospitals, that presents both the indicator value but also a measure of the size/variance at organisations. They allow limits to be drawn between what we might expect by chance, and what we might consider to be a signal for investigation. Organisations outside the funnel limits should be examined, first for data quality issues and then for issues with process and clinical care. Overdispersion means that these limits are often too strict, but they can be inflated to adjusted for this.

## References

Clinical Indicators Team, NHS Digital. 2018. “Summary Hospital-Level Mortality Indicator (SHMI) - Indicator Specification.” NHS Digital.

Goldstein, Harvey, and David J. Spiegelhalter. 1996. “League Tables and Their Limitations: Statistical Issues in Comparisons of Institutional Performance.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 159 (3): 385–409. https://doi.org/10/chf9kj.

Hilbe, Joseph M. 2014. Modeling Count Data. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139236065.

Lilford, R., M. A. Mohammed, D. Spiegelhalter, and R. Thomson. 2004. “Use and Misuse of Process and Outcome Data in Managing Performance of Acute Medical Care: Avoiding Institutional Stigma.” Lancet 363 (9415): 1147–54. https://doi.org/10.1016/s0140-6736(04)15901-1.

Mohammed, Mohammed A, KK Cheng, Andrew Rouse, and Tom Marshall. 2001. “Bristol, Shipman, and Clinical Governance: Shewhart’s Forgotten Lessons.” The Lancet 357 (9254): 463–67. https://doi.org/10/cqjskf.

Spiegelhalter, David J. 2005a. “Funnel Plots for Comparing Institutional Performance.” Stat Med 24 (8): 1185–1202. https://doi.org/10.1002/sim.1970.

———. 2005b. “Handling over-Dispersion of Performance Indicators.” Quality and Safety in Health Care 14 (5): 347–51. https://doi.org/10.1136/qshc.2005.013755.

Spiegelhalter, David J., Christopher Sherlaw-Johnson, Martin Bardsley, Ian Blunt, Christopher Wood, and Olivia Grigg. 2012. “Statistical Methods for Healthcare Regulation: Rating, Screening and Surveillance.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 175 (1): 1–47. https://doi.org/10.1111/j.1467-985X.2011.01010.x.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag.