Measuring effectiveness of drugs in observational databanks: promises and perils
© BioMed Central Ltd 2004
Received: 11 December 2003
Accepted: 20 January 2004
Published: 5 February 2004
Skip to main content
© BioMed Central Ltd 2004
Received: 11 December 2003
Accepted: 20 January 2004
Published: 5 February 2004
Observational databanks have inherent strengths and shortcomings. As in randomized controlled trials, poor design of these databanks can either exaggerate or reduce estimates of drug effectiveness and can limit generalizability. This commentary highlights selected aspects of study design, data collection and statistical analysis that can help overcome many of these inadequacies. An international metaRegister and a formal mechanism for standardizing and sharing drug data could help improve the utility of databanks. Medical journals have a vital role in enforcing a quality checklist that improves reporting.
The decision to terminate the Women's Health Initiative (WHI) study, a randomized controlled trial (RCT) of hormone replacement therapy, and the public anxiety caused by the subsequent media publicity have put the hierarchy of evidence in epidemiology in the spotlight. Clinical medicine including rheumatology has also sometimes witnessed similar contradictions between the results of RCTs and observational studies. For example, RCTs indicated an efficacy for auranofin greatly exceeding that observed in observational studies or in clinical practice [1–3]. A meta-analysis of RCTs in 1990  concluded that the efficacy of injectable gold salts, penicillamine and sulfasalazine did not differ from that of methotrexate in patients with rheumatoid arthritis. By contrast, and more in line with clinical experience, observational research reports indicated that courses of methotrexate were continued for much longer time than other agents, suggesting a better experience with this drug. Currently penicillamine and auranofin are almost never used for treating rheumatoid arthritis. Thus, some prominent clinical trials published in well-respected journals reached conclusions that were not validated in clinical practice.
The tools of observational epidemiology become critical 'when the perfectionist demands of clinical trials crash against the shoals of real-world conditions' . There can never be an RCT for every single clinical question. Many important observations over the past two decades in rheumatology would not have been possible without observational research. Recognition of outcomes such as work disability, functional disability, and increased mortality rates in rheumatoid arthritis required long-term observational studies. More recently, the success of 'inverted pyramid' strategies for patients with rheumatoid arthritis has been documented . The problem of gastrointestinal bleeding, ulcers, and obstruction associated with non-steroidal anti-inflammatory drugs was not apparent from RCTs but rather from long-term observational databases. Furthermore, the wide differences in toxicity between the non-steroidal anti-inflammatory drugs themselves were not demonstrated by the multiple RCTs.
Some limitations of randomized controlled trials
Patient selection limited by inclusion and exclusion criteria
Short time frame, as long-term clinical trials are ethically or logistically not possible
Differential drop-out patterns between arms of the trial
Statistically significant results might not necessarily be clinically significant, and vice versa
Surrogate markers such as joint tenderness might be suboptimal indicators of prevention of severe long-term outcomes such as radiographic destruction and work disability
Chance (bad luck) can lead to unbalanced groups
Inflexible dosage schedules
'Dose creep' from trial to clinic, rendering trial obsolete
Inability to identify rare adverse events
Hawthorne effect: patients in a study alter their behavior when they are told to be in the study
Design bias: randomized controlled trials might be designed to maximize the probability of a particular outcome, namely the superiority of the new drug
The Food and Drug Administration of the USA has introduced a requirement for post-marketing surveillance of newer drugs including biological agents; these are now being pursued by pharmaceutical industry, which has set up several surveillance databanks. In addition to monitoring for safety, these databanks collect information that has potential business applications. Such information includes drug dosage and drug switching patterns of the manufacturer's drugs as well as those of their competitors. It is not known to what extent these data are put to use for drug marketing. In addition, many of these databanks might not adhere to recommended standards for longitudinal studies [8, 9].
One of the biggest criticisms of observational databanks results from potential bias in assignment of treatment by a physician. 'Confounding by indication' means that certain treatments are preferentially given to sicker patients and certain treatments preferentially to healthier patients. Thus, it is not uncommon for aspirin to be associated with increased risk for acute myocardial infarction in observational studies, because it is prescribed to those with a higher risk for coronary events. Many studies use statistical methods such as propensity scores that purportedly adjust for such bias. In this method of adjustment, the probability (propensity) of each patient's receiving a treatment is calculated on the basis of the collected information such as age, gender, and education. This propensity score can then be used for 'adjusting' for the effect of confounders by matching, by stratification, and by regression models. However, propensity scores might not adjust for unobserved covariates , especially if such covariates are not correlated with observed covariates. Furthermore, once data are collected, there is no fully satisfactory means to determine whether the adjustment is proportionate to the magnitude of the underlying confounding effect.
The second set of potential limitations results from patient self-selection. Very few databank studies report the number and characteristics of patients who were invited to be a part of the study but who eventually declined, whereas a lack of similar information in a report of an RCT might be considered unacceptable. Selection might also occur if patients or physicians receive financial incentives to complete questionnaires or enroll in studies (such as those studies sponsored by pharmaceutical industry). Another major issue is attrition or subject drop-out. Non-random drop-outs from studies are inevitable, and selective attrition of subjects can result in biased (often exaggerated) estimates of drug effectiveness. Very few databanks have formally reported the issue of attrition among their subject population.
The third set of limitations involves measurement of outcomes. Although questionnaire-based self-reports of outcomes might be considered to be as informative as physician-based measures , the practicalities of measurement, analysis, and interpretation raise several issues. Longitudinal observational studies typically measure outcomes in specified intervals of 3, 6, or 12 months. Because the start and end of a drug course do not necessarily correspond to the measurement dates, difficulties can arise in correlating outcomes with drug courses. Thus, patient outcomes from drug courses shorter than the interval between measurements tend to be selectively lost. Because early termination of drug courses might indicate failure due to toxicity or inefficacy, the loss of information from these drug courses has the potential to bias the effectiveness estimates upwards. Besides, the absence of a 'washout period' in observational studies makes it difficult to disentangle the effects of current therapies from the residual effects of past therapies, particularly when the clinical half-life is varied and long .
Observational studies need to be protocol-driven, with prospective data collection including the Health Assessment Questionnaire (HAQ) or its variants, short form 36 (SF-36), or a similar instrument at regular intervals [8, 9]. Where drop-outs occur, careful documentation of the details (change in address, refusal, worsening health, and so on) of such losses is required. Rigor in data collection in observational databanks can and should be equivalent to that of RCTs.
We believe the criticism of unobserved bias has been overused. It should not be applied uncritically unless a specific, plausible unmeasured confounder is specified. Such potential confounders need to meet both of the two criteria of confounding, namely (1) association with outcome and (2) no association with the observed variables used for statistical adjustment. We agree with Moses  that it is important for the treating physician to record why the patient is being given the therapy selected. This information should be a powerful adjustment variable; 'arranging to collect it will call for imaginative thinking, experimentation, and patience, but it is an idea deserving much effort' .
Several steps could be taken within the existing framework for clinical research that can go a long way in improving the use of databanks. Many of the problems with observational studies can be minimized with careful planning in advance of the study. Ideally the subjects in longitudinal databanks should be truly representative of the population. Short of that, a databank should include all consecutive patients observed at the databank center.
We propose an international online registry for observational databanks similar to the metaRegister of Controlled Trials (mRCT; http://www.controlled-trials.com/mrct/, accessed 10 January 2004). All the databanks in such a register should meet certain minimum methodological standards such as those proposed by the Outcome Measures in Rheumatology (OMERACT). This register could collate the data collection protocols and list of publications from each member databank and serve as a convenient reference for publications. This register would also help the users to be certain that they are aware of all the observational evidence relevant to a particular question, avoid duplication of effort, and encourage collaboration.
Patients who participate in databanks do so primarily on the basis of altruism. Patients trust their physicians to use their information for the greatest good of all others with the same disease. Although researchers who obtain funding and collect data deserve to have credit in terms of primacy and publications, data more than, say, 5 years old could very well be shared. Currently such informal data sharing exists through academic networking but the potential is probably not fully used. Research organizations such as the National Institutes of Health and the Centers for Disease Control have placed large amounts of data online, ready to be downloaded. There is little reason why similar sharing of data from rheumatic disease databanks for non-commercial purposes could not be phased in over time.
Medical journals have a key role in enforcing quality standards on reporting observational studies. Unfortunately, journals do not explicitly insist on the guidelines such as those by OMERACT. Providing checklists of reporting requirements similar to the CONSORT (Consolidated Standards of Reporting Trials) checklist for RCTs  would streamline the reporting of drug effectiveness data from observational studies.
Patient databanks are here to stay. Our plea here is for methodologically sound observational studies to raise the bar in the performance of clinical research.
Consolidated Standards of Reporting Trials
Health Assessment Questionaire
Outcome Measures in Rheumatology
randomized controlled trial
short form 36.
This work was supported by grant AR43584 from the National Institutes of Health to the Arthritis, Rheumatism and Aging Medical Information Systems (ARAMIS).