Diagnostic tests are used by clinicians to identify the presence or absence of a condition in a patient for the purpose of developing an appropriate treatment plan (White et al. 2011). They can include imaging and biochemical technologies, pathological and psychological investigation, and signs and symptoms observed during history taking and clinical evaluations (Deeks. 2001). New diagnostic tests are continuously developed, driven by demands for improvements in speed, cost, ease of performance, patient safety and accuracy (White et al. 2011). Consequently there are often several tests available for the diagnosis of a particular condition. This highlights the importance of clinicians and other healthcare practitioners having access to high level evidence on the accuracy of the diagnostic tests they use or are considering using. The end goal of diagnostic tests is that they result in improved outcomes in areas that are important to the patient. Systematic reviews that investigate whether diagnostic tests improve outcomes are reviews of effectiveness, however, and should be carried out using the methodology from the chapter on effectiveness. Primary studies that investigate the accuracy of diagnostic tests are termed diagnostic test accuracy (DTA) studies, and it is the systematic review of these which will be the focus of this chapter.
Diagnostic test accuracy studies compare a diagnostic test of interest (the ‘index test’) to an existing diagnostic test (the ‘reference test’), which is known to be the best test currently available for accurately identifying the presence or absence of the condition of interest. The outcomes of the two tests are then compared with one another in order to evaluate the accuracy of the index test. There are two main types of studies of DTA. The first is the diagnostic case- control design, also sometimes called the ‘two gate design’. In this study design people with the condition (cases) come from one population (i.e. a health care centre for people known to have the condition), while people without the condition come from another. Although this design gives an indication of the maximum accuracy of the test, the results will generally give an exaggerated indication of the test’s accuracy in practice (Leeflang et al. 2013).
The second study design is cross-sectional, and involves all patients suspected of having the condition of interest undergoing the index test and the reference test. Those who test positive for the condition by the reference test can be considered to be the cases, whereas those who test negative are the controls.
This study design is held to reflect actual practice better and is more likely to provide a valid estimate of diagnostic accuracy (Leeflang et al. 2013).
Systematic reviews of diagnostic test accuracy provide a summary of test performance based on all available evidence, evaluate the quality of published studies, and account for variation in findings between studies (Deeks. 2001; Leeflang et al. 2013). Estimates of test accuracy frequently vary between studies, often due to differences in how test positivity is defined, study design, patient characteristics and positioning of the test in the diagnostic pathway (Leeflang et al. 2013). Furthermore, DTA studies have unique design characteristics which require different criteria for critical appraisal compared to other sources of quantitative evidence, and report a pair of related summary statistics (‘sensitivity and specificity’, as discussed below) rather than a single statistic such as an odds ratio. Consequently systematic reviews of DTA studies require different statistical methods for meta-analytical pooling, and different approaches for narrative synthesis (Leeflang et al. 2014).
Diagnostic accuracy is predominantly represented by two measures, sensitivity and specificity; however sometimes other measures, including predictive values, odds-ratios, likelihood ratios, and summary receiver operating characteristic (ROC) curves, are used (Leeflang et al. 2014). Sensitivity refers to the probability of a person with the condition of interest having a positive result (also known as the true positive proportion [TPP]), while specificity is the probability of a person without the condition of interest having a negative result (also known as the true negative proportion [TNP]) (Leeflang et al. 2014). It should be noted that these definitions refer to the clinical situation, and other definitions of sensitivity and specificity exist that are used in different contexts (Sackett and Haynes. 2002). Sensitivity and specificity have been identified as essential measures of diagnostic accuracy (Leeflang et al. 2013; Leeflang et al. 2014; Habbema et al. 2009; Leeflang et al. 2013b).