### Theoretical Motivation for Considering Collider Bias

####
*Collider Bias and COVID-19*

*Arrows indicate effects of exposure \((A)\) and outcome \((Y)\) on selection into sample. Dashed lines indicate an induced correlation by conditioning on the sample.*

#### Development

This app highlights some of the functionality of the R Package`AscRtain`

developed by
Gibran Hemani
and
Tom Palmer.
This RShiny app was created by Gareth Griffith as a pedagogical supplement to the MedRxiv preprint 'Collider bias undermines our understanding of COVID-19 disease risk and severity.' written with colleagues at the MRC-IEU, 2020.

**Ascertainment Bias**or

**Collider Bias.**Whilst it is largely acknowledged that sampling bias may affect representativeness of study findings, it is less well understood that conditioning on a collider may substantially bias the estimated association between an exposure and outcome.

*Why is this particularly problematic in the analysis of COVID-19 data?*

Because COVID-19 study participants are likely to be strongly non-random and if we consider \(S\) to be

**selection into a COVID-sample**then we condition on it by solely considering study participants. More worryingly, it is likely that exposure and outcome both predict entry into the COVID-19 samples, meaning we are conditioning on a collider and will produce biased estimates.

Take two major sources of COVID-19 data:

**COVID-19 cases**and

**voluntary self-report:**

We know

*COVID testing*is non-random in the population. Factors associated with this non-randomness such as having pre-existing conditions, or being a key worker may plausibly predict both whether you receive a test

*and*your risk of COVID-19. Similarly for

*self-reporting*individual health-anxiety may plausibly predict both whether you opt into data collection

*and*your risk of COVID-19.

If we estimate population associations based on these ascertained values our results and causal conclusions may also be biased.

Functionally this presents a problem as we cannot know the true effect of \(A\) and \(Y\) on sample participation. However, given known population frequencies for exposure and outcome, we can estimate possible selection effects which would give rise to observed outcomes under a true null. Depending on the range of values returned this allows us to make more informed inference about bias in our estimated associations.

## Estimating over a plausible parameter space

`AscRtain`

R Package.
The
`parameter_space()`

function simulates values over the possible parameter space for selection
into a study which could give rise to an observed OR between exposure and
outcome. This plots the possible selection effects on exposure (\(\beta_A\))
and outcome (\(\beta_Y\)) which could give rise to a user-defined observed
odds ratio (OR) under a true known OR of 1.
*Population parameters*are defined as follows:

\(P(S=1)\) gives the proportion of the population that is in the sample.

\(P(A=1)\) gives the proportion of the population for whom the exposure A is true

\(P(Y=1)\) gives the proportion of the population for whom the outcome Y is true.

\(P(A \cap Y)\) gives the proportion of the population for whom A and Y are both true.

*Selection effects*are defined as follows:

\(\beta_0\) is the baseline probability of being selected into the sample.

\(\beta_A\) is the effect on selection into the sample given A=1 is true.

\(\beta_0\) is the effect on selection into the sample given Y=1 is true.

\(\beta_{AY}\) is the effect on selection into the sample given A=1 and Y=1 are both true.

##### Observed Relationship

##### Population Parameters

##### Selection Effects

*Note: this plot will only render if the observed OR implies difference (OR!=1)*

Estimated Parameter Combinations Parameter Combinations plausibly giving rise to \(P(S=1)\) Parameter Combinations plausibly producing observed OR

## Useful Resources

#### Key References

Griffith, Gareth, Tim M. Morris, Matt Tudball, Annie Herbert, Giulia Mancano, Lindsey Pike, Gemma C. Sharp, Tom M. Palmer, George Davey Smith, Kate Tilling, Luisa Zuccolo, Neil M. Davies, and Gibran Hemani. 2020. Collider Bias undermines our understanding of COVID-19 disease risk and severity. MedRxiv Preprint

Munafo, Marcus R., Kate Tilling, Amy E. Taylor, David M. Evans, and George Davey Smith. 2018. Collider Scope: When Selection Bias Can Substantially Influence Observed Associations. International Journal of Epidemiology 47 (1): 226-35.

Miguel Angel Luque-Fernandez, Michael Schomaker, Daniel Redondo-Sanchez, Maria Jose Sanchez Perez, Anand Vaidya, Mireille E Schnitzer. 2019. Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application International Journal of Epidemiology, 48(2): 640-653.

Smith LH, and VanderWeele TJ. 2019. Bounding bias due to selection. Epidemiology. 30(4): 509-516.

Elwert, Felix, and Christopher Winship. 2014. Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annual Review of Sociology 40 (July): 31-53.

Cole, Stephen R., Robert W. Platt, Enrique F. Schisterman, Haitao Chu, Daniel Westreich, David Richardson, and Charles Poole. 2010. Illustrating Bias due to Conditioning on a Collider. International Journal of Epidemiology 39 (2): 417-20.

Groenwold, Rolf H. H., Tom M. Palmer, Kate Tilling. 2020. Conditioning on a mediator to adjust for unmeasured confounding OSF Preprint

#### Pedagogical Resources

The following apps give an informative introduction to collider bias in observational data, and allow the user to explore possible relationships.Sensitivity Analysis for Selection Bias website. Smith LH, and Vanderweele TJ. 2019.

CollideR app. Luque-Fernandez et al. 2019.

Bias app. Groenwold, Palmer, and Tilling. 2019.

These R Packages allow a user to define a DAG and simulate data from it, which can inform the size of bias for a specified model.

`lavaan`

R Package
Rosseel. 2012.
`dagitty`

R Package
Textor et al. 2016.
`simMixedDAG`

R Package