Possible strategies for use of artificial intelligence in screen-reading of mammograms, based on retrospective data from 122,969 screening examinations
AuthorLarsen, Marthe; Aglen, Camilla Flåt; Hoff, Solveig Roth; Lund-Hanssen, Håkon; Hofvind, Solveig Sand-Hanssen
Methods A total of 122,969 digital screening examinations performed between 2009 and 2018 in BreastScreen Norway were retrospectively processed by an AI system, which scored the examinations from 1 to 10; 1 indicated low suspicion of malignancy and 10 high suspicion. Results were merged with information about screening outcome and used to explore consensus, recall, and cancer detection for 11 different scenarios of combining AI and radiologists.
Results Recall was 3.2%, screen-detected cancer 0.61% and interval cancer 0.17% after independent double reading and served as reference values. In a scenario where examinations with AI scores 1–5 were considered negative and 6–10 resulted in standard independent double reading, the estimated recall was 2.6% and screen-detected cancer 0.60%. When scores 1–9 were considered negative and score 10 double read, recall was 1.2% and screen-detected cancer 0.53%. In these two scenarios, potential rates of screen-detected cancer could be up to 0.63% and 0.56%, if the interval cancers selected for consensus were detected at screening. In the former scenario, screen-reading volume would be reduced by 50%, while the latter would reduce the volume by 90%.
Conclusion Several theoretical scenarios with AI and radiologists have the potential to reduce the volume in screen-reading without affecting cancer detection substantially. Possible influence on recall and interval cancers must be evaluated in prospective studies.