Theoretical Motivation for Considering Collider Bias

Collider Bias and COVID-19

Arrows indicate effects of exposure \((A)\) and outcome \((Y)\) on selection into sample. Dashed lines indicate an induced correlation by conditioning on the sample.


This app highlights some of the functionality of the R Package AscRtain developed by Gibran Hemani and Tom Palmer.

This RShiny app was created by Gareth Griffith as a pedagogical supplement to the MedRxiv preprint 'Collider bias undermines our understanding of COVID-19 disease risk and severity.' written with colleagues at the MRC-IEU, 2020.

Consider the scenario in which we want to test if a given exposure \((A)\) influences an outcome \((Y)\). Consider also that these two variables both independently cause a third variable \((S)\). Conditioning on \(S\) will induce a relationship betweeen \(A\) and \(Y\). This is known as Ascertainment Bias or Collider Bias. Whilst it is largely acknowledged that sampling bias may affect representativeness of study findings, it is less well understood that conditioning on a collider may substantially bias the estimated association between an exposure and outcome.

Why is this particularly problematic in the analysis of COVID-19 data?
Because COVID-19 study participants are likely to be strongly non-random and if we consider \(S\) to be selection into a COVID-sample then we condition on it by solely considering study participants. More worryingly, it is likely that exposure and outcome both predict entry into the COVID-19 samples, meaning we are conditioning on a collider and will produce biased estimates.

Take two major sources of COVID-19 data: COVID-19 cases and voluntary self-report:

We know COVID testing is non-random in the population. Factors associated with this non-randomness such as having pre-existing conditions, or being a key worker may plausibly predict both whether you receive a test and your risk of COVID-19. Similarly for self-reporting individual health-anxiety may plausibly predict both whether you opt into data collection and your risk of COVID-19.

If we estimate population associations based on these ascertained values our results and causal conclusions may also be biased.

Functionally this presents a problem as we cannot know the true effect of \(A\) and \(Y\) on sample participation. However, given known population frequencies for exposure and outcome, we can estimate possible selection effects which would give rise to observed outcomes under a true null. Depending on the range of values returned this allows us to make more informed inference about bias in our estimated associations.

Estimating over a plausible parameter space

This page demonstrates some of the functional utility of the AscRtain R Package. The parameter_space() function simulates values over the possible parameter space for selection into a study which could give rise to an observed OR between exposure and outcome. This plots the possible selection effects on exposure (\(\beta_A\)) and outcome (\(\beta_Y\)) which could give rise to a user-defined observed odds ratio (OR) under a true known OR of 1.

Population parameters are defined as follows:

\(P(S=1)\) gives the proportion of the population that is in the sample.
\(P(A=1)\) gives the proportion of the population for whom the exposure A is true
\(P(Y=1)\) gives the proportion of the population for whom the outcome Y is true.
\(P(A \cap Y)\) gives the proportion of the population for whom A and Y are both true.

Selection effects are defined as follows:

\(\beta_0\) is the baseline probability of being selected into the sample.
\(\beta_A\) is the effect on selection into the sample given A=1 is true.
\(\beta_0\) is the effect on selection into the sample given Y=1 is true.
\(\beta_{AY}\) is the effect on selection into the sample given A=1 and Y=1 are both true.
Observed Relationship

Population Parameters
Selection Effects
Note: this plot will only render if the observed OR implies difference (OR!=1)

Estimated Parameter Combinations

                  Parameter Combinations plausibly giving rise to \(P(S=1)\)

                  Parameter Combinations plausibly producing observed OR


Useful Resources

Key References

Griffith, Gareth, Tim M. Morris, Matt Tudball, Annie Herbert, Giulia Mancano, Lindsey Pike, Gemma C. Sharp, Tom M. Palmer, George Davey Smith, Kate Tilling, Luisa Zuccolo, Neil M. Davies, and Gibran Hemani. 2020. Collider Bias undermines our understanding of COVID-19 disease risk and severity. MedRxiv Preprint

Munafo, Marcus R., Kate Tilling, Amy E. Taylor, David M. Evans, and George Davey Smith. 2018. Collider Scope: When Selection Bias Can Substantially Influence Observed Associations. International Journal of Epidemiology 47 (1): 226-35.

Miguel Angel Luque-Fernandez, Michael Schomaker, Daniel Redondo-Sanchez, Maria Jose Sanchez Perez, Anand Vaidya, Mireille E Schnitzer. 2019. Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application International Journal of Epidemiology, 48(2): 640-653.

Smith LH, and VanderWeele TJ. 2019. Bounding bias due to selection. Epidemiology. 30(4): 509-516.

Elwert, Felix, and Christopher Winship. 2014. Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annual Review of Sociology 40 (July): 31-53.

Cole, Stephen R., Robert W. Platt, Enrique F. Schisterman, Haitao Chu, Daniel Westreich, David Richardson, and Charles Poole. 2010. Illustrating Bias due to Conditioning on a Collider. International Journal of Epidemiology 39 (2): 417-20.

Groenwold, Rolf H. H., Tom M. Palmer, Kate Tilling. 2020. Conditioning on a mediator to adjust for unmeasured confounding OSF Preprint

Pedagogical Resources

The following apps give an informative introduction to collider bias in observational data, and allow the user to explore possible relationships.

Sensitivity Analysis for Selection Bias website. Smith LH, and Vanderweele TJ. 2019.
CollideR app. Luque-Fernandez et al. 2019.
Bias app. Groenwold, Palmer, and Tilling. 2019.

These R Packages allow a user to define a DAG and simulate data from it, which can inform the size of bias for a specified model.
lavaan R Package Rosseel. 2012.
dagitty R Package Textor et al. 2016.
simMixedDAG R Package