- Created by Carolyn Phillips on Nov 21, 2023
You are viewing an old version of this page. View the current version.
Compare with Current View Version History
Version 1 Next »
A meta-analysis is a statistical process that essentially calculates effect sizes for individual studies, converts them to a common metric, and then combines them to obtain an average effect size (Field, 2001). This statistical combination increases the power of the overall estimate from various small individual studies as a result of the overall increase in the sample size. In addition, meta-analysis also enables reviewers to explore the differences between individual studies. Meta-analysis should only be undertaken when the studies are sufficiently similar to combine; in the absence of this homogeneity, the conclusion from the meta-analysis may be invalid. The findings may also depend on the selection and quality of the studies included and the availability of relevant data.
Where meta-analysis is used, the statistical methods and the software used should be described. Prior to a meta-analysis to be undertaken, relevant data needs to extracted. If the data is heterogeneous and is presented as a narrative summary, sources of heterogeneity should be discussed (e.g. clinical, methodological or statistical) as well as on what basis it was determined inappropriate to combine the data statistically (such as differences in populations, study designs or clinical or statistical heterogeneity).
There are established methods for conducting meta-analyses of randomized controlled trials and some observational study designs. However, no clear guidance exists on synthesizing frequency data from incidence and prevalence estimates. This section provides this guidance.
Effect size
The effect size statistically describes the relationship between two variables and is represented by a square on forest plot. In traditional effectiveness reviews, this could be the impact of a new therapy on mortality rates or the effect of a new teaching method on exam scores. The effect size could be a single number such as for a prevalence study or a ratio such as a risk ratio. The effect size has been described as being the “currency of the systematic review” as the aim of the meta-analysis is to summarize the effect size of each included study to obtain a summary effect (Borenstein M, Hedges LV, Higgins JPT, & Rothstein HR, 2009). The summary effect is shown as a diamond on a forest plot. When effect sizes are statistically combined, the methods used make certain assumptions.
Statistical combination of data
In meta-analysis, the results of similar, individual studies are combined to determine the overall effect. In meta-analysis, the effect-size and weight of each study are calculated. The effect size indicates the direction and magnitude of the results of a particular study (i.e. do the results favor the treatment or control, and if so, by how much), while the weight is indicative of how much information a study provides to the overall analysis when all studies are combined together.
It has been suggested that there are three important criteria for choosing a summary statistic for meta-analysis: consistency of effect across studies, (ii) mathematical properties, and (iii) ease of interpretation (Deeks & Altman, 2001).
Consistency of effect is important because the aim of meta-analysis is to bring together the results of several studies into a single result.
The main mathematical property required by summary statistics is the availability of a reliable variance estimate. Consensus about the other two mathematical properties (reliance on which of the two outcome states [e.g. mortality/survival] is coded as the event and odds ratio being the only statistic which is unbounded) has not yet been reached.
Ease of interpretation
Essentially there are three popular approaches to conduct meta-analysis for all types of data: Hedge and Olkin technique, Rosenthal and Rubin technique and the Hunter and Schmidt technique. Hedge and Olkin developed both fixed- and random-effects models for pooling data, Rosenthal and Rubin developed a fixed-effects model only and Hunter and Schmidt developed a random-effects model.
Statistical assumptions in meta-analysis
Meta-analysis can be based on either one of two statistical assumptions – fixed or random effects. It is important to distinguish between fixed- and random-effects models when conducting meta-analysis, as it can lead to false assumptions about statistical significance of the pooled estimate.
The main difference between fixed and random effects models is in the calculation of standard errors associated with the combined effect size. Fixed effects models use only within-study variability in their error term because all other ‘unknowns’ in the model are assumed not to affect the effect size. In contrast, in random effects models it is necessary to account for the errors associated with sampling from populations that themselves have been sampled from a superpopulation. As such the error term contains two components: within-study variability and variability arising from differences between studies (Field, 2001).
The fixed effects model assumes that there is one true effect for the population underlying the studies in the analysis and that all the differences in the data are due to sampling error or chance within each study and that there is no heterogeneity between the studies. A fixed effect model is statistically stringent and should be used when there is little heterogeneity, as determined by Chi-square or I2 tests. This model therefore assumes that the overall sample consists of samples that all belong to the same underlying population (Kock, 2009). The between-study variability will be zero in this model as it assumes that the population effect size is identical for all studies. In an analysis based on a fixed effects model, inference is applicable or generalizable (“conditional”) based on statistical justification only on the studies actually done (Petitti, 2000). The fixed effects model assumes that there is little interest or value in generalizing the results to other studies (Fleiss, 1993; Munn, Tufanaru, & Aromataris, 2014).
A random effects model allows more flexibility, assuming that there may be other factors influencing the data than error or chance, within and between studies. As a result, in an analysis based on a random effects model, inference relies on the assumption that the studies used in the analysis are a random sample of some hypothetical population of studies (Munn, Tufanaru, et al., 2014; Petitti, 2000). For example, the effect size may be influenced in studies where the participants are more educated, older or healthier or if a more intense intervention is being used. The effect size is assumed to follow a normal distribution and consequently has a mean and variance. The random-effects model considers both between-study variability and within-study variability. The random-effects model enables generalizations beyond the population included in the studies.
There is no consensus about whether fixed or random effects models should be used in meta-analysis. In many cases when heterogeneity is absent, the two methods will give similar overall results. When heterogeneity is present, the random effects estimate provides a more conservative estimate of the overall effect size, and is less likely to detect significant differences. For this reason, random effects models are sometimes employed when heterogeneity is not severe; however, the random effects model does not actually analyze the heterogeneity away and should not be considered as a substitute for a thorough investigation into the reasons for heterogeneity. Additionally, random effects models give relatively more weight to the results of smaller studies – this may not be desirable because smaller studies are typically more prone to bias and are often lower quality than larger studies.
There are a number of meta-analytical techniques available – the selection of a particular technique is governed by three things: the study type, the nature of the data extracted and the assumptions underlying the meta-analysis.
Meta-analysis of prevalence and incidence data - Proportions
Prevalence and incidence data is often reported as a proportion. When pooling proportions for meta-analysis, a transformation of the data is required. There are two main ways to transform the data, the Freeman-Tukey transformation (arcsine square root transformation), and the Logit transformation, both of these are used to calculate the weighted summary proportion under the fixed and random effects model. The resultant meta-analysis will give pooled proportion with 95% CI both for the fixed effects model and the random effects model and in addition, will list the proportions (expressed as a percentage), with their 95% CI, found in the individual studies included in the meta-analysis. The results are then presented graphically in a forest plot. For all meta-analyses, prevalence estimates are transformed to logits to improve their statistical properties. These are then back-transformed to prevalence.
Converting proportions (p) to logits (Sutton, Abrams, Jonas, Sheldon, & Song, 2000):
Logit = log(odds) = log(p/1?p).
Using the number of cases with an event (Nevent) and without an event (N-event), the variance of logit is given by
Var(logit)=1/Nevent +1N-event.
There are different models for performing the meta-analysis as mentioned above. Normally the reviewer is provided with a choice of using the Mantel-Haenszel model or the DerSimonian and Laird model. We recommend that the meta-analyses of the prevalence reported in the studies is grouped by a random-effects model and is presented with 95% confidence intervals (95% CI). Random effects model are used when there is sufficient information on standard errors. However, bear in mind that the random-effects model gives a conservative estimate with a wider confidence interval. The random effects model allows for between-study variation by assuming that the individual study prevalence estimates follow a normal distribution. The fixed model can be chosen but the reviewer should be aware of its underlying principles, particularly in relation to its assumption that there is one true effect, which may not hold for prevalence and incidence data.
Heterogeneity of the results is tested by the I-squared, Tau-squared, Cochran’s Q test and Chi-squared (p > 0.05) tests. These tests of heterogeneity evaluate whether the differences in prevalence estimates across studies are higher than expected by chance. To identify the sources of heterogeneity across studies, subgroup analysis or meta-regression can be used to assess the contribution of each variable (i.e. year of study, geographic location, characteristic of countries, study population etc.) to the overall heterogeneity. Those variables significantly associated with the heterogeneity (p < 0·05) can be included in a multivariate hierarchical model. A p value of <0·05 is considered statistically significant in all the analyses.
Below is an example of a table of studies that were combined in a meta-analysis. These studies reported on overall termination rates for scans in the general MRI population.
Study | Events | Sample | % |
Dantendorfer 1997 | 2 | 297 | 0.673400673 |
Dewey 2007 (all) | 1004 | 55734 | 1.801413859 |
Eshed 2007 | 59 | 4821 | 1.223812487 |
Lang et al (2010)* | 336 | 34521 | 0.973320587 |
Nawaz 2009* | 58 | 2630 | 2.205323194 |
Sarji 1998 | 18 | 3324 | 0.541516245 |
Wiebe 2004 | 14 | 1790 | 0.782122905 |
Figure: Meta-analysis of scan termination due to claustrophobia in general scan types
The figure above represents meta-analysis of proportion data using random effects model from the seven studies. There was significant heterogeneity present in the studies, with a Cochran Q test reaching statistical significance and an I2 value of 96.1%. The pooled proportion equaled 1.18% (95% CI 0.79 – 1.65). However, due to the significant heterogeneity, this value should be interpreted with caution.
There are limitations with conducting meta-analysis of frequency data, including (Saha, Chant, & McGrath, 2008):
Heterogeneity of data: If the data from the included studies are heterogenous, then the standard errors or confidence intervals for a pooled effect estimate will not adequately reflect the variability of underlying data.
Inadequate reporting of frequency estimates: standard error (SE) for each estimate is required to weight the estimate when pooling the data. Standard errors can still be calculated if the data for the numerator, denominator and the duration of the study were available; however, these calculated SEs will not take into account various adjustments.
How to interpret effect sizes?
Once authors calculate effect sizes, they need to answer the question: What does the effect size mean?
An effect size is simply a number and its meaning and importance must be explained by the researcher. An effect size of any magnitude can mean different things depending on the research that produced it and the results of similar past studies. Therefore, it is the researcher’s responsibility to discuss the importance of their findings and this information requires comparing current effects to those obtained in previous work in the same research area. Confidence Intervals (CIs) are an important way to evaluate the precision of a study’s findings by providing a range of likely values around the obtained effect size.
Heterogeneity
When used in relation to meta-analysis, the term ‘heterogeneity’ refers to the amount of variation in characteristics of included studies. For example, if three studies are to be included in a meta-analysis, does each of the included studies have similar demographics, and assess the same intervention? While some variation between studies will always occur due to chance alone, heterogeneity is said to occur if there are significant differences between studies, and under these circumstances meta-analysis is not valid and should not be undertaken.
There are three types of heterogeneity; clinical, methodological, and statistical heterogeneity (Higgins & Thompson, 2002). Differences in the characteristics of study populations and measurements represent clinical heterogeneity. Differences in study designs and methodological quality (risk of bias) represent methodological heterogeneity. Statistical heterogeneity is the variation of effects sizes between studies. Statistical heterogeneity may arise because of clinical heterogeneity, methodological heterogeneity, or simply by chance.
There is often heterogeneity amongst studies addressing prevalence and incidence. This is due to a number of reasons. Firstly, clinical heterogeneity may be present due to the measures used to determine the presence of a variable (Webb et al., 2005). For example, different scales exist to measure depression, and depending on the scale used, a person may be classified as suffering from depression whilst using one scale and not suffering based on a different scale. Additionally, prevalence and incidence studies often look at specific populations at a specific point of time, and the scope of the study may be limited by state or national borders. Another consideration with the population is whether those considered at risk or eligible for the disease have been included (Webb et al., 2005). For example, if look at the prevalence or incidence of breast cancer, have these studies reported the proportion out of the whole population, all females, only adult females, and so on. These different populations may contribute to clinical heterogeneity.
Methodological heterogeneity is also important to consider. Prevalence and incidence data can arise from various study designs with differing levels of methodological quality. This can also results in differences amongst studies.
But how does one tell whether or not differences are significant?
Firstly, the studies should be assessed carefully to determine whether there is clinical or methodological heterogeneity present. If conducting a meta-analysis, then a visual inspection of the meta-analysis output (e.g. the forest plot) is the first stage of assessing heterogeneity. If the results are scattered across the forest plot and none of the confidence intervals overlap, this is a good indicator of heterogeneity.
A formal statistical test of the similarity of studies is provided by test of homogeneity. The test calculates a probability (P value) from a Chi-square statistic calculated using estimates of the individual study weight, effect size and the overall effect size. Note, however, that this test suffers from lack of power – and will often fail to detect a significant difference when a difference actually exists – especially when there are relatively few studies included in the meta-analysis. Because of this low power, some review authors use a significance level of P < 0.1, rather than the conventional 0.05 value, in order to protect against the possibility of falsely stating that there is no heterogeneity present. Often when combining the results from a series of observational studies, this is the default significance level due to the increased heterogeneity associated inherent with the study design.
The I2 statistic is the percentage of observed total variation across due to heterogeneity and not due to chance. A value of 0% indicates no observed heterogeneity and larger values show increasing heterogeneity.
If there is statistically significant heterogeneity, a narrative synthesis or graphical representation is recommended.
Subgroup analysis (Analysis of subgroups or subsets):
Subgroup analysis is a means of investigating heterogeneous results and can be used to estimate the influence of various subsets including age group, gender, type of population and sampling strategy used to gather data (e.g. letter, phone, face-to-face). However, subgroups should be pre-specified a priori and should be few. Subgroup analysis may include by study design or by patient groups.
Meta-regression
Meta-regression investigates whether particular covariates explain any of the heterogeneity of treatment effects between studies. A meta-regression is either a linear or logistic regression and can be fixed-effect or random-effect model. The unit of analysis is a study and predictors in the regression are the study-level covariates.
Publication bias
The research that appears in the published literature may be systematically unrepresentative of the population of completed studies. ‘File drawer’ problem or ‘Publication bias’ is a term coined by Rosenthal to mean the number of statistically non-significant studies (p > 0.05) that remain unpublished (Rosenthal & Rubin, 1982). A Funnel plot is used to detect publication bias. This is a scatter plot of effect estimate (x-axis) against inverse of its variance (y-axis). If there is no bias then the funnel will appear symmetric and inverted and if there is bias, the funnel will be asymmetric or skewed in shape.
- No labels