Application of Machine Learning to Osteoporosis and Osteopenia Screening Using Hand Radiographs
From the *Division of Plastic and Reconstructive Surgery, Department of Surgery, Stanford University School of Medicine, Stanford, CA; †Department of Orthopedics, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, Republic of China; ‡Department of Orthopaedic Surgery, Stanford University School of Medicine, Stanford, CA; and §Robert A. Chase Hand and Upper Limb Center, Department of Orthopaedic Surgery, Stanford University Medical Center, Redwood City, CA.
Received for publication February 6, 2024; accepted in revised form September 10, 2024.
Corresponding author : Jeffrey Yao, MD, Robert A. Chase Hand and Upper Limb Center,Department of Orthopaedic Surgery, Stanford University Medical Center, 450 Broadway Street, MC 6342, Redwood City, CA 94063; e-mail: jyao@stanford.edu.
0363-5023/24/---0001$36.00/0 https://doi.org/10.1016 j.jhsa.2024.09.008
OSTEOPOROSIS AND OSTEOPENIA are common conditions with considerable morbidity. The lifetime incidence of any osteoporotic fragility fracture is estimated to be 40% to 50% in women and 13% to 22% in men.1 An estimated 9 million osteoporotic fractures occur worldwide annually.2,3 Fragility fractures may decrease function and contribute to nearly six million disability-adjusted life years lost annually.2 Moreover, patients with hip fractures experience a five-fold to eight-fold increase in all-cause mortality post-fracture.4 The rates of intervention for osteoporosis remain low, despite evidence that screening and treatment decrease future risk of fragility fractures. Improved screening to identify and appropriately treat individuals with poor bone health may help decrease the morbidity and mortality associated with fragility fractures.
Data collection
Institutional review board approval was obtained. An institutional database was queried to identify all patients between 1998 and 2019 who underwent both a DXA scan and a hand radiograph within 12 months of each other. The reports for the DXA scan within 12 months of the radiograph were obtained and the T-scores were extracted. High-resolution images of corresponding posteroanterior view hand radiographs were exported from our institution's Picture Archiving and Communication System. Hand radiograph images were labeled with DXA T-score and category (osteoporosis, osteopenia, or normal). Definitions of categories followed the standard World Health Organization (WHO) definitions using T-scores21 as follows: normal, T >= -1.0; osteopenia, -2.5 < T < -1.0; and osteoporosis, T <= -2.5.

FIGURE 1: Diagram of model architecture.
All image preprocessing, model execution, and performance evaluation were performed using Python. A model was designed using the ResNet-50 algorithm, which comprises 49 convolution layers and one fully connected layer constructed into 16 residual blocks. ResNet architectures employ residual connections to avoid the problem of gradient vanishing, where gradients diminish after passing too many layers. The base model was pretrained on the ImageNet data set, a large data set containing millions of images. The neural network was programmed with the PyTorch 2.0 framework and trained for 35 epochs.
There was a total of 687 images in the normal category, 607 images in the osteopenia category, and 130 images in the osteoporosis category, for a total of 1,424 images. When predicting low bone density (osteopenia or osteoporosis) versus normal bone density, sensitivity was 88.5%, specificity was 65.4%, overall accuracy was 80.8%, and the area under the curve was 0.891, at the standard threshold of 0.5. If optimizing for both sensitivity and specificity, at a threshold of 0.655, the model achieved a sensitivity of 84.6% at a specificity of 84.6%.
Discussion
In this study, a neural network was developed and validated to screen for osteoporosis and osteopenia in routine hand radiographs. Specifically, a CNN was trained to identify low BMD in hand radiographs as correlated to the reference standard based on DXA hip T-scores and was found to have a sensitivity of 88.5%, a specificity of 65.4%, and an accuracy of 80.8% of diagnosis. The high sensitivity, and therefore low false negative rate, reflects the potential utility of the algorithm as a screening tool to identify patients with low BMD.

When predicting low BMD (osteopenia or osteoporosis) versus normal BMD, sensitivity was 88.5%, specificity was 65.4%, and precision was 83.6% at the standard classification threshold of 0.5 (Fig. 2A). Overall accuracy was 80.8%. The F1-score was 0.86, and the AUC was 0.891 (Fig. 2B). When optimizing for both sensitivity and specificity, at a threshold of 0.655, the model achieved a sensitivity of 84.6% at a specificity of 84.6%


CONFLICTS OF INTEREST
No benefits in any form have been received or will be received related directly to this article.
ACKNOWLEDGMENTS
This work was supported by the J2022 American Association for Hand Surgery Annual Research Grant. The authors thank Akousist for their assistance on this study
REFERENCES