Deeks, J.J., Dinnes, J., D’Amico, R. et al. (5 more authors) (2003) Evaluating non-randomised intervention studies. Health Technology Assessment, 7 (27). pp. 1-179. ISSN 1366-5278
Background In the absence of randomised controlled trials (RCTs), healthcare practitioners and policy-makers rely on non-randomised studies to provide evidence of the effectiveness of healthcare interventions. However, there is controversy over the validity of non-randomised evidence, related to the existence and magnitude of selection bias.
Objectives To consider methods and related evidence for evaluating bias in non-randomised intervention studies.
Methods 1. Three reviews were conducted to consider:
empirical evidence of bias associated with non-randomised studies
the content of quality assessment tools for non-randomised studies
the use of quality assessment in systematic reviews of non-randomised studies.
These reviews were conducted systematically, identifying relevant literature through comprehensive searches across electronic databases, handsearches and contact with experts.
2. New empirical investigations were conducted generating non-randomised studies from two large, multicentre RCTs by selectively resampling trial participants according to allocated treatment, centre and period. These were used to examine:
systematic bias introduced by the use of historical and non-randomised concurrent controls
whether results of non-randomised studies are more variable than results of RCTs
the ability of case-mix adjustment methods to correct for selection bias introduced by non-random allocation.
The resampling design overcame particular problems of meta-confounding and variability of direction and magnitude of bias that hinder the interpretation of previous reviews.
Results Empirical comparisons of randomised and non-randomised evidence Eight studies compared results of randomised and non-randomised studies across multiple interventions using meta-epidemiological techniques. The studies reached conflicting conclusions, explicable by differences in:
whether data were sourced from primary studies or systematic reviews
consideration of meta-confounding
inclusion of studies of varying quality
criterion for classifying discrepancies in results.
The only deducible conclusions were (a) results of randomised and non-randomised studies sometimes, but not always, differ and (b) both similarities and differences may often be explicable by other confounding factors.
Quality assessment tools for evaluating non-randomised studies We identified 194 tools that could be or had been used to assess non-randomised studies. Around half were scales and half checklists, most were published within systematic reviews and most were poorly developed with scant attention paid to principles of scale development.
Sixty tools covered at least five of six pre-specified internal validity domains (creation of groups, blinding, soundness of information, follow-up, analysis of comparability, analysis of outcome), although the degree of coverage varied. Fourteen tools covered three of four core items of particular importance for non-randomised studies (How allocation occurred? Was the study designed to generate comparable groups? Were prognostic factors identified? Was case-mix adjustment used?). Six tools were thought suitable for use in systematic reviews.
Use of quality assessment in systematic reviews of non-randomised studies Of 511 systematic reviews that included non-randomised studies, only 169 (33%) assessed study quality. Many used quality assessment tools designed for RCTs or developed by the authors themselves, and did not include key quality criteria relevant to non-randomised studies. Sixty-nine reviews investigated the impact of quality on study results in a quantitative manner.
Empirical estimates of bias associated with non-random allocation The bias introduced by non-random allocation was noted to have two components. First, the bias could lead to consistent over- or underestimations of treatment effects. This occurred for historical controls, the direction of bias depending on time trends in the case-mix of participants recruited to the study. Second, the bias increased variation in results for both historical and concurrent controls, owing to haphazard differences in case-mix between groups. The biases were large enough to lead studies falsely to conclude significant findings of benefit or harm.
Empirical evaluation of case-mix adjustment methods Four strategies for case-mix adjustment were evaluated: none adequately adjusted for bias in historically and concurrently controlled studies. Logistic regression on average increased bias. Propensity score methods performed better, but were not satisfactory in most situations. Detailed investigation revealed that adequate adjustment can only be achieved in the unrealistic situation when selection depends on a single factor. Omission of important confounding factors can explain underadjustment. Correlated misclassifications and measurement error in confounding variables may explain the observed increase in bias with logistic regression, as may differences between conditional and unconditional odds ratio estimates of treatment effects.
Conclusions Results of non-randomised studies sometimes, but not always, differ from results of randomised studies of the same intervention. Non-randomised studies may still give seriously misleading results when treated and control groups appear similar in key prognostic factors. Standard methods of case-mix adjustment do not guarantee removal of bias. Residual confounding may be high even when good prognostic data are available, and in some situations adjusted results may appear more biased than unadjusted results.
Although many quality assessment tools exist and have been used for appraising non-randomised studies, most omit key quality domains. Six tools were considered potentially suitable for use in systematic reviews, but each requires revision to cover all relevant quality domains.
Healthcare policies based upon non-randomised studies or systematic reviews of non-randomised studies may need re-evaluation if the uncertainty in the true evidence base was not fully appreciated when policies were made.
The inability of case-mix adjustment methods to compensate for selection bias and our inability to identify non-randomised studies which are free of selection bias indicate that non-randomised studies should only be undertaken when RCTs are infeasible or unethical.
|Institution:||The University of York|
|Academic Units:||The University of York > Centre for Reviews and Dissemination (York)|
|Depositing User:||York RAE Import|
|Date Deposited:||14 Aug 2009 14:43|
|Last Modified:||18 May 2010 18:19|
|Publisher:||National Coordinating Centre for Health Technology Assessment|