Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies

Stephen R Pye, Thérèse Sheppard, Rebecca M Joseph, Mark Lunt, Nadyne Girard, Jennifer S Haas, David W Bates, David L Buckeridge, Tjeerd P van Staa, Robyn Tamblyn, William G Dixon

Research output: Contribution to journalArticleAcademicpeer-review


PURPOSE: Real-world data for observational research commonly require formatting and cleaning prior to analysis. Data preparation steps are rarely reported adequately and are likely to vary between research groups. Variation in methodology could potentially affect study outcomes. This study aimed to develop a framework to define and document drug data preparation and to examine the impact of different assumptions on results.

METHODS: An algorithm for processing prescription data was developed and tested using data from the Clinical Practice Research Datalink (CPRD). The impact of varying assumptions was examined by estimating the association between 2 exemplar medications (oral hypoglycaemic drugs and glucocorticoids) and cardiovascular events after preparing multiple datasets derived from the same source prescription data. Each dataset was analysed using Cox proportional hazards modelling.

RESULTS: The algorithm included 10 decision nodes and 54 possible unique assumptions. Over 11 000 possible pathways through the algorithm were identified. In both exemplar studies, similar hazard ratios and standard errors were found for the majority of pathways; however, certain assumptions had a greater influence on results. For example, in the hypoglycaemic analysis, choosing a different variable to define prescription end date altered the hazard ratios (95% confidence intervals) from 1.77 (1.56-2.00) to 2.83 (1.59-5.04).

CONCLUSIONS: The framework offers a transparent and efficient way to perform and report drug data preparation steps. Assumptions made during data preparation can impact the results of analyses. Improving transparency regarding drug data preparation would increase the repeatability, reproducibility, and comparability of published results.

Original languageEnglish
Pages (from-to)781-788
Number of pages8
JournalPharmacoepidemiology and Drug Safety
Issue number7
Publication statusPublished - Jul 2018


  • data preparation
  • pharmacoepidemiology
  • reproducibility transparency
  • transparency


Dive into the research topics of 'Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies'. Together they form a unique fingerprint.

Cite this