When 105% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor

Arredondo Montero , Javier

doi:doi.org/10.18502/ijrm.v23i12.20719

دوره 23، شماره 12 - ( 10-1404 ) جلد 23 شماره 12 صفحات 1053-1051 | برگشت به فهرست نسخه ها

‎ doi.org/10.18502/ijrm.v23i12.20719

Mendeley

Zotero

RefWorks

Arredondo Montero J. When 105% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor. IJRM 2025; 23 (12) :1051-1053
URL: http://ijrm.ir/article-1-3634-fa.html

When 105% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor. International Journal of Reproductive BioMedicine. 1404; 23 (12) :1051-1053

URL: http://ijrm.ir/article-1-3634-fa.html

When 105% is too much: Reflections on boundaries and statistical methods in diagnostic test accuracy meta-analytical modeling: A letter to the editor

چکیده: (52 مشاهده)

Reading a diagnostic test accuracy (DTA) meta-analysis often feels like a careful reconstruction of reasoning, coherence, and analytical validity. A reported sensitivity of 97.6%, with a 95% CI extending to 105.5%, is one such red flag. Far from trivial, this reflects the use of statistical methods inappropriate for bounded data. Upon reviewing the meta-analysis by Keshari et al. (1), I identified several methodological issues that compromise the validity of their findings and merit correction.
Confidence intervals for sensitivity and specificity must fall within the 0-100% range, as these are proportions bounded between 0 and 1. Figure 2 in the meta-analysis (1) reports a sensitivity of 97.6% (95% CI: 89.65-105.55%) for Dugoff et al. an impossible result suggesting an unbounded model without transformation for proportion data (e.g., logit, log, or Freeman-Tukey double arcsine). The pooled sensitivity (90.88%, 95% CI: 80.92-100.85%) likewise exceeds 100%, indicating a formally invalid estimate. Instead of addressing the problem, the authors truncate the upper bound in the abstract, reporting “90.9 (95% CI: 80.9-100%)”, which presents an adjusted rather than a model-derived value. Such modification diminishes transparency and misrepresents analytic uncertainty.
The methods further conflate concepts by stating that heterogeneity was “evaluated using Cochran’s Q test and the DerSimonian-Laird method”. Cochran’s Q tests for between-study variability, while DerSimonian-Laird is a random-effects estimator applied after such variability was detected. Although DerSimonian-Laird is cited in the methods (despite its limitations compared to restricted maximum likelihood) (2), several forest plots (e.g., Figures 2 and 3) indicate the use of restricted maximum likelihood estimation. This inconsistency between the reported and applied models reduces reproducibility.
The meta-analysis also performs univariate pooling of sensitivity without plotting specificity or employing hierarchical or bivariate models, which account for correlation and threshold effects (3, 4). This approach limits interpretability. Moreover, restricting the analysis to only 4 studies from a larger review of over 70 may introduce selection bias and diminish generalizability.
Clinical heterogeneity further undermines the pooled results. In figure 2, the authors combine the sensitivity for all aneuploidies from Schlaikjær Hartwig et al. (5) with that for trisomy 21 from Dugoff et al. (6), yielding the invalid 95% CI noted earlier, even though the original trial reported a valid 97% (95% CI: 83.8-99.7%). Pooling such distinct endpoints without stratification or sensitivity analyses violates the principle of clinical coherence. Sensitivity and detection rate (diagnostic yield) are also used interchangeably, though they represent different measures: sensitivity denotes the proportion of true positives among affected individuals, whereas detection rate refers to positive tests among the screened population. This conceptual distinction is critical for interpretability.
A further error appears in the abstract: “MicroRNA levels were significantly increased (standardized mean difference 1.22, 95%: CI: -0.90 to 3.34)”. Because the CI includes 0, the difference is not statistically significant; indeed, figure 3 shows p = 0.26. Reporting it as significant misrepresents the evidence. The wide CI (-0.90 to 3.34) also reflects extreme imprecision, with heterogeneity indices (I² = 97.85%, Q = 38.6, p < 0.001, τ² = 4.45) confirming severe inconsistency that invalidates any pooled inference.
Moreover, in figure 3, all control groups appear with identical mean values (1.00), which the original data do not support, for instance, Lamadrid-Romero et al. reported no such uniformity. If standardization or imputation was applied, this should have been explicitly stated, as standardized mean differences are sensitive to such transformations.
Several sensitivity estimates, including those from Schlaikjær Hartwig et al. and Dugoff et al. (5, 6), were directly extracted without reconstructing 2×2 tables. Although convenient, this practice departs from recommended DTA standards that require independent reconstruction to ensure consistent definitions and denominators. Omitting this step risks propagating biases and precludes assessment of threshold effects.
The study process also lacks essential transparency: it was not registered in PROSPERO, and inclusion/exclusion criteria are only broadly described. Such omissions conflict with accepted standards for systematic reviews and reduce reproducibility. The use of the Joanna Briggs Institute checklist, instead of QUADAS-2, the standard tool for DTA quality assessment, further weakens the methodological rigor and diverges from PRISMA-DTA guidelines.
Finally, the use of funnel plots to assess publication bias is inappropriate when fewer than 10 studies are included (3). With only 5 studies analyzed, such plots are underpowered and unreliable.
In summary, the meta-analysis contains several methodological errors that materially affect its conclusions. Reporting sensitivity values exceeding 100% and modifying confidence intervals post hoc indicates the need to revisit the underlying statistical models rather than adjust the presentation. Diagnostic meta-analysis requires bounded data transformations, hierarchical modeling, and transparent reporting to ensure valid inference. These observations are intended not as criticism but as constructive clarification, to support more rigorous and reproducible application of meta-analytic methods in diagnostic research.

متن کامل مقاله [PDF 360 kb] (59 دریافت)

نوع مطالعه: Letter to Editor |

فهرست منابع

1. Keshari JR, Prakash P, Sinha SR, Prakash P, Rani K, Aziz T, et al. Diagnostic potential of cell-free fetal nucleic acids in predicting pregnancy complications: A systematic review and meta-analysis on trisomy, pre-eclampsia, and gestational diabetes. Int J Reprod BioMed 2025; 23: 111-130. [DOI:10.18502/ijrm.v23i2.18476] [PMID] [PMCID]

2. Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, et al. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods 2016; 7: 55-79. [DOI:10.1002/jrsm.1164] [PMID] [PMCID]

3. Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y. Cochrane handbook for systematic reviews of diagnostic test accuracy. Version 2.0. Cochrane; 2023. Available at: https://training.cochrane.org/handbook-diagnostic-test-accuracy/current. [DOI:10.1002/9781119756194]

4. Arredondo Montero J. Diagnostic test accuracy meta-analysis: A practical guide to hierarchical models. J Surg Res 2025; 315: 768-781. [DOI:10.1016/j.jss.2025.09.072] [PMID]

5. Schlaikjær Hartwig T, Ambye L, Gruhn JR, Petersen JF, Wrønding T, Amato L, et al. Cell-free fetal DNA for genetic evaluation in Copenhagen Pregnancy Loss Study (COPL): A prospective cohort study. Lancet 2023; 401: 762-771. [DOI:10.1016/S0140-6736(22)02610-1] [PMID]

6. Dugoff L, Koelper NC, Chasen ST, Russo ML, Roman AS, Limaye MA, et al. Cell-free DNA screening for trisomy 21 in twin pregnancy: A large multicenter cohort study. Am J Obstet Gynecol 2023; 229: 435. [DOI:10.1016/j.ajog.2023.04.002] [PMID]

بازنشر اطلاعات
	این مقاله تحت شرایط Creative Commons Attribution-NonCommercial 4.0 International License قابل بازنشر است.

کلیه حقوق این وب سایت متعلق به International Journal of Reproductive BioMedicine می باشد.

طراحی و برنامه نویسی : یکتاوب افزار شرق

Designed & Developed by : Yektaweb

نظرسنجی