Performance of Two Deep Learning–based AI Models for Breast Cancer Detection and Localization on Screening Mammograms from BreastScreen Norway
Permanent lenke
https://hdl.handle.net/10037/37913Dato
2025-02-05Type
Journal articleTidsskriftartikkel
Peer reviewed
Forfatter
Martiniussen, Marit Almenning; Larsen, Marthe; Hovda, Tone; Kristiansen, Merete U.; Dahl, Fredrik Andreas; Eikvil, Line; Brautaset, Olav; Bjørnerud, Atle; Kristensen, Vessela N.; Bergan, Marie Burns; Hofvind, Solveig Sand-HanssenSammendrag
Purpose - To evaluate cancer detection and marker placement accuracy of two artificial intelligence (AI) models developed for interpretation of screening mammograms.
Materials and Methods - This retrospective study included data from 129 434 screening examinations (all female patients; mean age, 59.2 years ± 5.8 [SD]) performed between January 2008 and December 2018 in BreastScreen Norway. Model A was commercially available and model B was an in-house model. Area under the receiver operating characteristic curve (AUC) with 95% CIs were calculated. The study defined 3.2% and 11.1% of the examinations with the highest AI scores as positive, threshold 1 and 2, respectively. A radiologic review assessed location of AI markings and classified interval cancers as true or false negative.
Results - The AUC value was 0.93 (95% CI: 0.92, 0.94) for model A and B when including screen-detected and interval cancers. Model A identified 82.5% (611 of 741) of the screen-detected cancers at threshold 1 and 92.4% (685 of 741) at threshold 2. Model B identified 81.8% (606 of 741) at threshold 1 and 93.7% (694 of 741) at threshold 2. The AI markings were correctly localized for all screen-detected cancers identified by both models and 82% (56 of 68) of the interval cancers for model A and 79% (54 of 68) for model B. At the review, 21.6% (45 of 208) of the interval cancers were identified at the preceding screening by either or both models, correctly localized and classified as false negative (n = 17) or with minimal signs of malignancy (n = 28).
Conclusion - Both AI models showed promising performance for cancer detection on screening mammograms. The AI markings corresponded well to the true cancer locations.