Esp 9.2.5 Assessment of methodological quality

Assessing the quality of diagnostic studies being considered for inclusion is a vital part of the systematic review process. Methodological quality relates to the risk of bias resulting from the design and conduct of the study. The quality of a diagnostic study is determined by its design, the methods by which the study sample is recruited, the conduct of tests involved, blinding in the process of interpreting tests, and the completeness of the study report. The process of critical appraisal examines the methodology of a study against pre-defined criteria, with the aim of considering individual sources of risk of bias and is used to evaluate the extent to which the results of a study should be believed or to be deemed valid after rigorous assessment (Reitsma et al. 2009).

Table 9.3 is modified and expanded from “Synthesizing evidence of diagnostic accuracy” (White et al. 2011; Reitsma et al. 2009) and highlights the major types of bias that can occur in diagnostic accuracy studies as a result of flawed or incomplete reporting. Attempts such as those by the Standards for Reporting of Diagnostic Accuracy (STARD) initiative (Bossuyt et al. 2003; Meyer et al. 2003), have been made to improve reporting, methodological quality and to aid primary researchers to address and avoid sources of bias.

Table 9.3: Types of bias in studies of diagnostic test accuracy

	Type of bias	When does it occur?	Impact on accuracy	Preventative measures
Patients/Subjects	Spectrum bias	When included patients do not represent the intended spectrum of severity for the target condition or alternative conditions	Depends on which end of the disease spectrum the included patients represent	Ensure that the included patients represent a broad sample of those that the test is intended for use with in clinical practice
Patients/Subjects	Selection bias	When eligible patients are not enrolled consecutively or randomly	Usually leads to overestimation of accuracy	Consider all eligible patients and enroll either consecutively or randomly
Index test	Information bias	When the index results are interpreted with knowledge of the reference test results, or with more (or less) information than in practice	Usually leads to overestimation of accuracy, unless less clinical information is provided than in practice, which may result in an under estimation of accuracy	Index test results should be interpreted without knowledge of the reference test results, or with more (or less) information than in practice
Reference test	Misclassification bias	When the reference test does not correctly classify patients with the target condition	Depends on whether both the reference and index test make the same mistakes	Ensure that the reference correctly classifies patients within the target condition
Reference test	Partial verification bias	When a non-random set of patients does not undergo the reference test	Usually leads to overestimation of sensitivity, effect on specificity varies	Ensure that all patients undergo both the reference and index tests
	Differential verification bias	When a non-random set of patients is verified with a second or third reference test, especially when this selection depends on the index test result	Usually leads to overestimation of accuracy	Ensure that all patients undergo both the reference and index tests
	Incorporation bias	When the index test is incorporated in a (composite) reference test	Usually leads to overestimation of accuracy	Ensure that the reference and test are performed separately
	Disease/Condition progression bias	When the patients’ condition changes between administering the index and reference test	Under- or Over-estimation of accuracy, depending on the change in the patients’ condition	Perform the reference and index with minimal delay. Ideally at the same time where practical
	Information bias	When the reference test data is interpreted with the knowledge of the index test results	Usually leads to overestimation of accuracy	Interpret the reference and index data independently
Data analysis	Excluded data	When uninterpretable or intermediate test results and withdrawals are not included in the analysis	Usually leads to overestimation of accuracy	Ensure that all patients who entered the study are accounted for and that all uninterpretable or intermediate test results are explained

The most widely used tool for examining diagnostic accuracy is the QUADAS 2 which was released in 2011 following the revision of the original QUADAS (Quality Assessment of Diagnostic Accuracy Studies) tool (Whiting et al. 2011). JBI encourages the use of QUADAS 2, and this chapter includes a checklist which incorporates the “signaling questions” from QUADAS 2 (Appendix I). It should be noted that QUADAS 2 includes questions regarding the level of concern that reviewers have for the applicability of the study under consideration to the research question. For JBI DTA systematic reviews, a primary research study should not proceed to critical appraisal if there is concern that the study does not match the inclusion criteria and research question. As such, this element of QUADAS2 is not addressed in the below checklist (Domains

1, 2, 3, 4).

Domain 1: Patient selection

In this section the risk of selection bias is assessed by how patients were selected for the study.

Was a consecutive or random sample of patients enrolled?
Was a case-control design avoided?
Did the study avoid inappropriate exclusions?

Domain 2: Index tests

In this section consideration is on whether the conduct and interpretation of the index test being investigated could have introduced bias.

Were the index test results interpreted without knowledge of the results of the reference standard?
If a threshold was used, was it pre-specified?

Domain 3: Reference standard/test

The focus of this section is to determine if and the extent that the way in which the reference test was conducted and interpreted could introduce bias into the study.

Is the reference standard likely to correctly classify the target condition?
Were the reference standard results interpreted without knowledge of the results of the index test?

Domain 4: Flow and timing

The aim of this section is to determine the risk of bias attributable to the order in which the index and reference tests were conducted in the study. If there is a long time delay between conduct of the two tests, the status of the patient may change and therefore impact the results of the later test. In addition, if the later test is conducted with knowledge of the results of the previous test, interpretation of the results may be impacted.

Was there an appropriate interval between the index test and reference standard?
Did all patients receive the same reference standard?
Were all patients included in the analysis?

The primary and secondary reviewer should discuss each item of appraisal for each study design included in their review. In particular, discussions should focus on what is considered acceptable for the review in terms of the specific study characteristics. The reviewers should be clear on what constitutes acceptable levels of information to allocate a positive appraisal compared with a negative, or a response of “unclear”.

This discussion should take place before independently conducting the appraisal. The weight placed on specific critical appraisal questions will vary between reviews and it is up to the reviewers to set what criteria will result in the inclusion/exclusion of a study. Many reviewers select a set of questions which must be answered “Yes” or the review will be excluded. It is important that these criteria be applied consistently across studies. Formerly, systematic review protocols published in JBI Evidence Synthesis appended the appraisal tool which would be used to their protocols. Instead Campbell et al. 2015 which describes the appraisal process and tool should be cited in the relevant section of the protocol method.