Whole-body MRI in children aged 6–18 years. Reliability of identifying and grading high signal intensity changes within bone marrow
Permanent link
https://hdl.handle.net/10037/25954Date
2022-04-21Type
Journal articleTidsskriftartikkel
Peer reviewed
Author
Zadig, Pia Karin Karlsen; von Brandis, Elisabeth; d’Angelo, Paola; Tanturri de Horatio, Laura; Müller, Lil-Sofie Ording; Rosendahl, Karen; Avenarius, Derk Frederik MatthausAbstract
Objective - To examine intra- and interobserver reliability of a scoring system for assessment of high signal areas within the bone marrow, as visualized on T2-weighted, fat-saturated images.
Materials and methods - Ninety-six whole-body MRIs (1.5 T) in 78 healthy volunteers (mean age: 11.5 years) and 18 children with chronic nonbacterial osteomyelitis (mean age: 12.4 years) were included. Coronal water-only Dixon T2-weighted images were used to score the left lower extremity/pelvis for high signal intensity areas, intensity (0–2 scale), extension (0–4 scale) and shape and contour in a blinded fashion by two pairs of radiologists.
Results - For the pelvis, grading of bone marrow signal showed moderate to good intra- and interobserver agreement with kappa values of 0.51–0.94 and 0.41–0.87, respectively. Corresponding figures for the femur were 0.61–0.68 within and 0.32–0.61 between observers, and for the tibia 0.60–0.72 and 0.51–0.73. Agreement for assessing extension was moderate to good both within and between observers for the pelvis (k = 0.52–0.85 and 0.35–0.80), for the femur (0.52–0.67 and 0.51–0.60) and for the tibia (k = 0.59–0.69 and 0.47–0.63) except for the femur metaphysis/diaphysis, with interobserver kappa values of 0.29–0.30. Scoring of shape was moderate to good within observers, but in general poorer between observers, with kappa values of 0.40–0.73 and 0.18–0.69, respectively. For contour, the corresponding figures were 0.35–0.62 and 0.09–0.54, respectively.
Conclusion - MRI grading of intensity and extension of high signal intensity areas within the bone marrow of pelvis and lower limb performs well and thus can be used interchangeably by different observers, while assessment of shape and contour is reliable for the same observer but is less reliable between observers. This should be considered when performing clinical trials.