Term | Definition |
Area under the curve (AUC) | In a receiver operating characteristic (ROC) curve analysis, an index of the performance of a diagnostic or screening measure in relation to diagnostic accuracy, summarized in a single value that typically ranges from 0.50 (no better than random classification) to 1.0 (perfect classification) (Polit & Yang, 2016); a measure of criterion validity or responsiveness. |
Ceiling effect | The effect of having scores restricted at the upper end of a score continuum which limits discrimination at the upper end of the measurement, constrains true variability and restricts the amount of upward change possible (Polit & Yang, 2016); a measure of content validity. |
Clinimetrics | The study of instruments where items may be major or minor; or present or absent (Gewitz et al., 2015). |
Comparative fit index (CFI) | A statistic used to evaluate the goodness of fit of a proposed model to the data (e.g. in a confirmatory factor analysis or item response theory analysis) involving the comparison of the proposed model with a null model; a value greater than 0.95 is often considered as indicative of good fit (Polit & Yang, 2016); a measure of construct validity. |
Construct validity | The degree to which evidence about a measure’s scores in relation to other scores supports the inference that a construct has been appropriately represented; the degree to which a measure captures the focal construct (Polit & Yang, 2016). |
Content validity | The degree to which a multi-item instrument has an appropriate set of relevant items reflecting the full content of the construct domain being measured (Polit & Yang, 2016); incorporates face validity. |
Content validity index (CVI) | An index summarizing the degree to which a panel of experts agrees on an instrument’s content validity (i.e. the relevance, comprehensiveness and balance of items comprising a scale) (Polit & Yang, 2016). There are item-level and scale-level CVI. |
Criterion validity | The extent to which scores on a measure are an adequate reflection of (or predictor of) a criterion (i.e. ‘gold standard’ measure) (Polit & Yang, 2016). |
Cronbach’s alpha coefficients (Coefficient alpha) | An index of internal consistency that indicates the degree to which the items on a multi-item scale are measuring the same underlying construct (Polit & Yang, 2016); a measure of reliability. |
Cross cultural validity | The degree to which the items on a translated or culturally adapted scale perform adequately and equivalently, individually and in the aggregate, in relation to their performance on the original instrument; an aspect of construct validity (Polit & Yang, 2016). |
Differential item functioning (DIF) | The extent to which an item functions differently for one group or culture than for another despite the groups being equivalent with respect to the underlying latent trait (Polit & Yang, 2016); a measure of cross-cultural validity. |
Face validity | The extent to which an instrument looks as though it is a measure of the target construct (Polit & Yang, 2016). An aspect of content validity. |
Factor analysis | A statistical procedure for disentangling complex interrelationships among items and identifying the items that ‘go together’ as a unified dimension; A measure of construct validity (Polit & Yang, 2016). |
Floor effect | The effect of having scores restricted at the lower end of a score continuum which limits the ability of the measure to discriminate at the lower end of the measurement, constrains true variability and limits the amount of downward change possible (Polit & Yang, 2016); a measure of content validity. |
Goodness of fit index (GFI) | A statistic used to evaluate the goodness of fit of a proposed model to the data (e.g. In confirmatory factor analysis); a value greater than .90 is often considered as an adequate fit (Polit & Yang, 2016); a measure of reliability. |
Internal consistency | The degree to which the subparts of a composite scale (i.e. the items) are interrelated and are all measuring the same attribute or dimension; a measure of reliability (Polit & Yang, 2016). |
Inter-rater reliability | The variation between two or more raters who measure the same group of subjects. |
Intra-class correlation coefficients (ICC) | Estimates the proportion of total variance in a set of scores that is attributable to true differences among the people or objects being measured (e.g. the test-retest reliability); a measure of reliability (Polit & Yang, 2016). |
Intra-rater reliability | The variation of data measured by a single rater across two or more occasions. |
Kappa | A statistical index of chance-corrected agreement or consistency between two nominal (or ordinal) measurements; often used to assess interrater or intra-rater reliability (Polit & Yang, 2016). |
Limits of agreement (LOA) | An estimate of the range of differences in two sets of scores that could be considered random measurement error, typically with 95% confidence; graphically portrayed on Bland-Altman plots (Polit & Yang, 2016); a measure of reliability. |
Measurement error | The systematic and random error of a person’s score on a measure , reflecting factors other that the construct being measured and resulting in an observed score that is different from a hypothetical true score; a measurement property within the reliability domain (Polit & Yang, 2016). |
Measurement properties | Instruments that incorporate psychometric or clinimetric characteristics. |
Non-normed fit index (NNFI) | Also known as Tucker-Lewis index (TLI)-see below. |
Psychometrics | The study of instruments that consist of items of equal weighting. |
Reliability | The degree to which a measurement is free from measurement error; the extent to which scores for people who have not changed are the same for repeated measurements; statistically, the proportion of total variance in a set of scores that is attributable to true differences among those being measured (Polit & Yang, 2016). |
Responsiveness | The ability of a measure to detect change over time in a construct that has changed, commensurate with the amount of change that has occurred (Polit & Yang, 2016). |
Root mean square error of approximation (RMSEA) | An index used to evaluate how well a hypothesized model fits the data (e.g. in confirmatory factor analysis or item response theory modelling); an RMSEA of less than .06 is considered an indicator of adequate fit (Polit & Yang, 2016); a measure of construct validity. |
Sensitivity | The ability of a screening or diagnostic instrument to correctly identify a ‘case’ (i.e. to correctly diagnose a condition) (Polit & Yang, 2016); a measure of criterion validity or responsiveness. |
Smallest detectable change (SDC) | An index that estimates the threshold for a ‘real’ change in scores (i.e. a change that, with 95% confidence, is beyond measurement error); the SDC is a change score that falls outside the limits of agreement on a Bland-Altman plot (Polit & Yang, 2016); a measure of reliability. |
Specificity | The ability of a screening or diagnostic instrument to correctly identify non-cases for a condition (Polit & Yang, 2016); a measure of criterion validity or responsiveness. |
Standard error of measurement (SEM) | An index that quantifies the amount of ‘typical’ error on a measure and indicates the precision of individual scores (Polit & Yang, 2016); a measure of reliability. |
Standardized root mean square residual (SRMR) | An index used to evaluate how well a hypothesized model fits the data (e.g. In a confirmatory factor analysis); an SRMR of less than 0.08 is considered an indicator of adequate fit (Polit & Yang, 2016); a measure of construct validity. |
Structural validity | The extent to which an instrument captures the hypothesized dimensionality of the broad construct; an aspect of construct validity (Polit & Yang, 2016). |
Test-retest reliability | The variation in measurements using an instrument on the same subject under the same conditions. |
Tucker-Lewis index (TLI) | Also known as non-normed fit index (NNFI). A statistic used to evaluate the goodness of fit of a proposed model to the data (e.g. In confirmatory factor analysis) involving the comparison of the proposed model with a null model; a value greater than 0.95 is often considered as indicative of a good fit (Polit & Yang, 2016); a measure of construct validity. |
Validity | In a measurement context, the degree to which an instrument is measuring the construct it purports to measure (Polit & Yang, 2016). |
General
Content
Integrations