Prediction Model of Spinal Osteoporosis Using Lumbar Spine X-Ray from Transfer Learning Deep Convolutional Neural Networks

Article information

Nerve. 2024;10(2):98-106
Publication date (electronic) : 2024 October 17
doi : https://doi.org/10.21129/nerve.2024.00598
Department of Neurosurgery, Inje University Haeundae Paik Hospital, Inje University College of Medicine, Busan, Republic of Korea
Corresponding author: Joon Bum Woo Department of Neurosurgery, Inje University Haeundae Paik Hospital, Inje University College of Medicine, 875, Haeun-daero, Haeundae-gu, Busan 48108, Republic of Korea Tel: +82-51-797-0840 Fax: +82-51-797-0841 E-mail: nanna0225@naver.com
Received 2024 July 16; Revised 2024 September 3; Accepted 2024 September 20.

Abstract

Objective

Osteoporosis is highly prevalent among older adults and women. This condition leads to a deterioration in bone mineral density and microarchitecture, significantly increasing the risk of fractures. Additionally, osteoporosis commonly results in complications such as screw loosening and non-union during spinal surgery. Deep-learning algorithms have now achieved an accuracy comparable to the current human margin of error. Therefore, this study explored the potential of using transfer learning in deep learning algorithms to predict, diagnose, and screen for osteoporosis using commonly obtained sagittal spine X-rays from patients with spinal conditions.

Methods

We retrospectively evaluated 2,300 consecutive patients who underwent dual energy X-ray absorptiometry (DXA) and lumbar sagittal plain X-ray exams between 2013 and 2021. The exclusion criteria included: (1) a gap of more than 1 year between the DXA and X-ray exams; (2) vertebrae that had undergone vertebroplasty; (3) lack of spine anterior-posterior DXA; and (4) images that were unassessable. Ultimately, 256 patients (images) were included in the study. Transfer learning was applied using convolutional neural network (CNN) techniques, specifically visual geometry group (VGG) 16, VGG 19, ResNet50, and Xception.

Results

The most accurate CNN model in the training group was ResNet50, with an accuracy of 0.95. ResNet50 showed the best performance, with an accuracy of 0.82, precision of 0.80, recall of 0.86, and F1-score of 0.83. Additionally, its area under the curve (0.76) was higher than that other CNN models. The confusion matrix for ResNet50’s performance displayed the outcomes for images predicted as osteoporosis (n=12) among the test data osteoporosis images (n=14)

Conclusion

Artificial intelligence (AI) technology employing deep learning techniques is significantly nearing human capabilities in the role of diagnostic assistance. The diagnosis of osteoporosis using bone mineral density is expected to evolve into a comprehensive diagnostic aid or decision-making tool with the integration of AI in the future.

INTRODUCTION

Osteoporosis is highly prevalent among the elderly and women. It leads to the deterioration of bone mineral density (BMD) and micro-architecture, which significantly impacts the likelihood of fractures10). The vertebral column is the most common site for osteoporotic fractures. Additionally, osteoporosis frequently results in complications such as screw loosening and non-union during spinal surgeries. Diagnosis typically involves dual energy X-ray absorptiometry (DXA), with osteoporosis confirmed by a T-score of -2.5 or lower. Patients with spinal conditions can readily and affordably undergo lumbar X-ray examinations, which are commonly performed.

The field of artificial intelligence (AI) is evolving rapidly, with machine learning and deep learning algorithms demonstrating remarkable accuracy and effectiveness. Presently, deep learning algorithms are being developed to match human levels of error. In the medical field, machine learning is increasingly applied for image-based diagnostics and other diagnostic processes. AI-based diagnosis of osteoporosis was made using X-ray, DXA, computed tomography (CT), and magnetic resonance images. it has certain limitations, such as the high cost and insurance coverage of DXA. AI-based diagnosis of osteoporosis by analysis of radiographs can be a cost-effective alternative to DXA. There are also studies being conducted to predict the occurrence of bone fractures in osteoporosis patients. In these studies, various deep learning architectures are being used to improve the accuracy of diagnosis4).

This study aims to explore whether it is possible to predict and diagnose osteoporosis using transfer learning with deep learning algorithms, employing X-rays commonly taken from patients with spinal conditions. Previous research utilizing X-rays has been limited in scope, and there have also been studies using CT scans. We selected and evaluated four renowned deep-learning algorithms for this purpose.

MATERIALS AND METHODS

1. Patients and Dataset

The study protocol was approved by the Institutional Review Board (IRB) of Inje University Haeundae Paik Hospital (IRB no. 2024-07-008). The requirement for informed consent was waived due to the retrospective nature of this study. We retrospectively evaluated 2,300 consecutive patients who underwent DXA and lumbar sagittal plain X-rays between 2013 and 2021. Patients with an interval of less than one year between DXA and X-ray examinations were selected. Osteoporosis was diagnosed with a T-score ≤ -2.5 according to the World Health Organization criteria, and osteopenia was classified as normal6). Vertebrae L1 to L3 from lumbar spine sagittal X-rays were selected. The reason for choosing L1 to L3 vertebral bodies is firstly because bone density increases to L5, and secondly, the L4/5 image is not good due to the overlap of the pelvic bone17). The exclusion criteria included: (1) an interval of more than one year between DXA and X-ray examinations; (2) vertebrae that had undergone vertebroplasty; (3) absence of spine anterior-posterior DXA; and (4) images that could not be evaluated. In total, 254 patients (images) were enrolled in this study. The dataset was divided into a training group (213 images), a validation group (18 images), and a test group (23 images). In the training group, 101 images were classified as osteoporosis and 112 as normal. In the validation group, 10 images were classified as osteoporosis and 8 as normal. In the test group, 13 images were classified as osteoporosis and 10 as normal (Fig. 1).

Fig. 1.

A total of 2,300 patients who underwent dual energy X-ray absorptiometry (DXA) and lumbar sagittal plain X-rays between 2013 to 2021 were included. The exclusion criteria were more than 1 year between DXA and the X-ray examination, the patient having undergone vertebroplasty or having a pathologic fracture, infected vertebrae, and low-quality image. Images were allocated to the training, validation, and test datasets, respectively.

2. Comparing Using Deep Learning Models

In this study, we compared image classification performance using four convolutional neural network (CNN) models. Initially, we utilized two models from the visual geometry group (VGG) series of deep CNNs, specifically VGG19 and VGG1613). Next, we employed the Residual Neural Network (ResNet50), a model designed by He et al.2), which won the classification task at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 Challenge. Generally, increasing the depth of network layers improves image identification accuracy; however, excessively deep networks can reduce accuracy. To address this issue, we adopted a learning approach known as residual learning, which allows networks to be extended up to 152 layers. This specific architecture of ResNet includes versions with 18, 50, and 152 layers. Third, we used Xception (Google Net), a CNN comprising 71 layers1). Each model demonstrated excellent results and low recognition error rates in the ILSVRC.

3. Data Processing

Image data were obtained from lumbar spine X-ray images formatted in Digital Imaging and Communications in Medicine. We selected sagittal X-ray images that included lumbar vertebrae 1, 2, and 3, excluding lumbar 4 and 5. To account for variations in X-ray scan parameters, we performed a series of grayscale normalizations, which included adjustments to window width, window level, and window pixel normalization. The ImageDataGenerator function is a useful tool that generates transformed images from the provided data and incorporates them into the learning set. This function proves especially effective for augmenting the dataset with additional image data.

4. Statistical Analysis

Descriptive statistics for categorical variables were presented as numbers (percentages), and continuous variables were reported as means with standard deviations. A two-sample t-test was performed for continuous variables that satisfied covariance. Statistical significance was set at a p-value of less than 0.05. To evaluate the diagnostic performance of the models for osteoporosis, we utilized the receiver operating characteristic curve. This curve is generated by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By adjusting the predicted probability threshold of the model, we calculated the area under the curve (AUC) values. A confusion matrix is a table used to assess the performance of a classification algorithm. Precision represents the ratio of correctly identified positive samples to the total samples labeled as positive by the model. Recall is the ratio of correctly identified positive samples to the actual positive samples. The F1-score is defined as the harmonic mean to simultaneously consider precision and recall, offering a balance between the two metrics in evaluating the performance of the classification model.

RESULTS

1. Patient Characteristics and Radiological Parameters

The average age of the 124 osteoporosis patients was 80.4 ± 9.6 years, while the 130 non-osteoporosis patients averaged 74.1 ± 11.0 years; the study included 189 women and 65 men. (p<0.01) The body mass index for the osteoporosis group was 23.0 ± 3.1, compared to 24.8 ± 3.9 for the non-osteoporosis group. (p<0.01) The BMD score for the osteoporosis group was -3.31 ± 0.63, and for the non-osteoporosis group, it was 0.32 ± 1.12. (p<0.025) (Table 1). X-ray sagittal images (total of 254 images) focused on the L1-3 vertebral column (Fig. 2).

Patient characteristics

Fig. 2.

Data classification of lumbar sagittal plain X-ray images. A. Normal BMD group, B. Osteoporosis group. DXA: dual energy X-ray absorptiometry; BMD: bone mineral density.

2. Model Training and Evaluate

Fig. 3 to 5, and Table 2 and 3 display the performance and diagnostic predictability of various CNN models in classifying osteoporosis and non-osteoporosis based on lumbar spine sagittal X-ray images. The most accurate CNN model was ResNet50 in the training data set accuracy of 0.95, validation data set accuracy 0.67, test data set accuracy 0.82 (Fig. 3, Table 2, 3). Table 3 details the CNN model evaluation on test data set images. ResNet50 shows the best performance with an accuracy of 0.82, precision of 0.80, recall of 0.86, and F1-score of 0.83 (Table 3). Fig. 4 presents the AUC results, indicating superior performance by ResNet50, with a value of 0.76, compared to other CNN models. A confusion matrix is used to define the performance of a classification algorithm. ResNet50 performance and the confusion matrix display the outcomes for images predicted as osteoporosis (n=12) among the test data osteoporosis images (n=14) (Fig. 5).

Fig. 3.

The training processes of the deep learning model. VGG: visual geometry group.

Fig. 4.

Comparison of the ROC curves and AUC of our osteoporosis prediction algorithm. ROC: receiver operating characteristic; AUC: area under the curve; TPR: true positive rate; FPR: false positive rate.

Fig. 5.

Comparison of the ROC curves and AUC of our osteoporosis prediction algorithm. ROC: receiver operating characteristic; AUC: area under the curve; TPR: true positive rate; FPR: false positive rate.

Model training results

Model evaluation in the test dataset

DISCUSSION

Osteoporosis has become a global public health concern as populations age and life expectancy increases. It is estimated that over 200 million people have been diagnosed with this condition14). Treatments such as teriparatide, romosozumab, and denosumab have been developed and are currently in use3,9). DXA is considered the gold standard for diagnosing osteoporosis, utilizing spectral imaging to measure differences in energy levels from two X-ray beams11). Unlike other bones, the lumbar vertebral column consists of approximately 66% to 75% cancellous bone. In osteoporosis, the vertebrae exhibit decreased BMD, leading to reduced thickness and number of trabeculae in the cancellous bone15).

Additionally, previous studies have indicated that cortical thickness or trabecular patterns can predict BMD12). While there have been several studies predicting BMD diagnosis using X-ray images and deep learning techniques5,7,8,16,18), fewer studies have utilized lumbar images, despite being the easiest and most commonly available clinical method. Research has also been conducted using dental and femur neck images18). Moreover, it is less common for clinicians to engage in clinical research using deep learning and transfer learning models, compared to expert-level computer engineering7).

In this study, we selected four deep learning methods (VGG 16, VGG 19, ResNet50, Xception) that have demonstrated good results in the ILSVRC and trained them using lumbar sagittal images. VGG 16, 19 architecture is structured starting with five blocks of convolutional layers followed by three fully-connected layers. VGG16 has approximately 138 million parameters and VGG19 has approximately 143 million parameters. Most of these parameters (approximately 100 million) are in the first fully connected layer, and it was since found that these fully connected layers could be removed with no performance downgrade, significantly reducing the number of necessary parameters13). Xception architecture is an extension of the Inception architecture which replaces the standard Inception modules with depthwise separable convolutions. Xception slightly outperforms InceptionV3 on the ImageNet dataset, and vastly outperforms it on a larger image classification dataset with 17,000 classes. Xception has 22,855,952 trainable parameters1). ResNet50 architecture are deep convolutional networks where the basic idea is to skip blocks of convolutional layers by using shortcut connections to form blocks named residual blocks. These stacked residual blocks greatly improve training efficiency and largely resolve the degradation problem present in deep networks. The total number of weighted layers is 50, with 23,534,592 trainable parameters2). In addition, unlike other studies, this study applied the transfer learning technique to VGG 16, VGG 19, ResNet50, and Xception. ResNet50 emerged as the top performer in this study, achieving an AUC value of 0.76 (Fig. 4). In other research, Jang et al.5) first utilized dental images (n=800) as a training dataset, model of that study showed an AUC value of 0.7 in the test data set (n=117) with a self-developed deep neural network based on VGG 16. Zhang et al.18) used anterior and sagittal X-ray images of the lumbar spine (n=1,616) as a training dataset, with their custom-developed deep CNN yielding an test dataset (n=198) AUC value of 0.767. Lee et al.7) used dental panoramic images (n=680) for training, applying transfer learning with fine-tuning based on VGG-16, which test dataset (n=137) resulted in an AUC value of 0.858. Sukegawa et al.16) employed dental panorama images (n=778) as a training dataset and used a combination of Efficient Net and ResNet models (18, 50, 152), enhancing accuracy with an ensemble model technique, test dataset (n=156) resulting in an AUC value of 0.911 (Table 4). These varied study outcomes indicate that the use of ensemble models tends to yield the best performance. Beyond simple deep learning techniques, modifications tailored to the image type and research objectives, such as transfer learning and ensemble methods, significantly impact results. This study aimed to assess the accuracy of deep learning techniques using straightforward images from a clinician’s perspective. If the number of data images is small, as in this study, various deep learning data processing methods can be used to solve this problem. Moreover, transfer learning and fine-tuning techniques within the CNN layers can yield effective results even with limited data7). It is believed that these methods will significantly benefit clinicians in the specialized field of medicine in the future.

Various studies on osteoporosis prediction using deep learning

AI technology, particularly deep learning, is increasingly playing a supportive role in diagnostics within the medical field. This technology represents a significant advancement. Until recently, clinicians faced challenges accessing complex computer coding, programming, and mathematical theories. However, with the emergence of AI technologies like ChatGPT, greater support is now available, facilitating collaboration across various fields and enhancing deep learning technology. The transfer learning method, in particular, makes deep learning technology more accessible to clinicians. Through fine-tuning, clinicians can achieve slightly improved results tailored to their specific expertise.

The first limitation of this study is the small size of the image dataset. While other studies have utilized datasets ranging from hundreds to thousands of images, this study was restricted to test results within one year to enhance the correlation between BMD and X-ray outcomes. The second limitation is related to the ensemble model, which could have shown improved results with more detailed fine-tuning techniques; however, this was not implemented due to technical constraints and shortcomings during the planning phase of data collection. The third limitation is the lack of a more detailed BMD scoring segmentation. The BMD scores between the two groups show a difference of about -3.0, indicating a need for more precise techniques to categorize into three groups: osteoporosis, osteopenia, and normal. This aspect was overlooked in the study design.

CONCLUSION

In this study, ResNet 50 showed the best performance in both the training and test sets. In the role of diagnostic assistance, AI technology employing deep learning techniques is significantly nearing human capabilities. The diagnosis of osteoporosis using BMD is expected to evolve into a comprehensive diagnostic aid or decision-making tool with the integration of AI in the future.

Notes

No potential conflict of interest relevant to this article was reported.

References

1. Chollet F. Xception: Deep learning with depthwise separable convolutions. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21-26; Honolulu, HI: IEEE. pp1800-1807.
2. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Paper presented at: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27-30; Las Vegas, NV: IEEE. pp770-778.
3. Huang W, Nagao M, Yonemoto N, Guo S, Tanigawa T, Nishizaki Y. Evaluation of the efficacy and safety of romosozumab (evenity) for the treatment of osteoporotic vertebral compression fracture in postmenopausal women: A systematic review and meta-analysis of randomized controlled trials (CDM-J). Pharmacoepidemiol Drug Saf 32:671–684. 2023;
4. Inigo SA, Tamilselvi R, Beham MP. A review on imaging techniques and artificial intelligence models for osteoporosis prediction. Curr Med Imaging [epub ahead of print, 2023. doi: 10.2174/1573405620666230608091911.
5. Jang R, Choi JH, Kim N, Chang JS, Yoon PW, Kim CH. Prediction of osteoporosis from simple hip radiography using deep learning algorithm. Sci Rep 11:19997. 2021;
6. Kanis JA. Assessment of fracture risk and its application to screening for postmenopausal osteoporosis: Synopsis of a WHO report. WHO study group. Osteoporos Int 4:368–381. 1994;
7. Lee KS, Jung SK, Ryu JJ, Shin SW, Choi J. Evaluation of transfer learning with deep convolutional neural networks for screening osteoporosis in dental panoramic radiographs. J Clin Med 9:392. 2020;
8. Lim HK, Ha HI, Park SY, Han J. Prediction of femoral osteoporosis using machine-learning analysis with radiomics features and abdomen-pelvic CT: A retrospective single center preliminary study. PLoS One 16:e0247330. 2021;
9. Napoli N, Langdahl BL, Ljunggren Ö, Lespessailles E, Kapetanos G, Kocjan T, et al. Effects of teriparatide in patients with osteoporosis in clinical practice: 42-month results during and after discontinuation of treatment from the european extended Forsteo® observational study (ExFOS). Calcif Tissue Int 103:359–371. 2018;
10. National Institutes of Health. NIH consensus development panel on osteoporosis prevention, diagnosis, and therapy, March 7-29, 2000: Highlights of the conference. South Med J 94:569–573. 2001;
11. Pisani P, Renna MD, Conversano F, Casciaro E, Muratore M, Quarta E, et al. Screening and early diagnosis of osteoporosis through X-ray and ultrasound based techniques. World J Radiol 5:398–410. 2013;
12. Sapthagirivasan V, Anburajan M. Diagnosis of osteoporosis by extraction of trabecular features from hip radiographs using support vector machine: an investigation panorama with DXA. Comput Biol Med 43:1910–1919. 2013;
13. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Paper presented at: 3rd International Conference on Learning Representations (ICLR 2015); 2015 May 7-9; San Diego, CA: ICLR. pp1-14.
14. Sözen T, Özışık L, Başaran N. An overview and management of osteoporosis. Eur J Rheumatol 4:46–56. 2017;
15. Spangler JG. Bone biology and physiology: implications for novel osteoblastic osteosarcoma treatments? Med Hypotheses 70:281–286. 2008;
16. Sukegawa S, Fujimura A, Taguchi A, Yamamoto N, Kitamura A, Goto R, et al. Identification of osteoporosis using ensemble deep learning model with panoramic radiographs and clinical covariates. Sci Rep 12:6088. 2022;
17. Tenne M, McGuigan F, Besjakov J, Gerdhem P, Åkesson K. Degenerative changes at the lumbar spine--implications for bone mineral density measurement in elderly women. Osteoporos Int 24:1419–1428. 2013;
18. Zhang B, Yu K, Ning Z, Wang K, Dong Y, Liu X, et al. Deep learning of lumbar spine X-ray for osteopenia and osteoporosis screening: A multicenter retrospective cohort study. Bone 140:115561. 2020;

Article information Continued

Fig. 1.

A total of 2,300 patients who underwent dual energy X-ray absorptiometry (DXA) and lumbar sagittal plain X-rays between 2013 to 2021 were included. The exclusion criteria were more than 1 year between DXA and the X-ray examination, the patient having undergone vertebroplasty or having a pathologic fracture, infected vertebrae, and low-quality image. Images were allocated to the training, validation, and test datasets, respectively.

Fig. 2.

Data classification of lumbar sagittal plain X-ray images. A. Normal BMD group, B. Osteoporosis group. DXA: dual energy X-ray absorptiometry; BMD: bone mineral density.

Fig. 3.

The training processes of the deep learning model. VGG: visual geometry group.

Fig. 4.

Comparison of the ROC curves and AUC of our osteoporosis prediction algorithm. ROC: receiver operating characteristic; AUC: area under the curve; TPR: true positive rate; FPR: false positive rate.

Fig. 5.

Comparison of the ROC curves and AUC of our osteoporosis prediction algorithm. ROC: receiver operating characteristic; AUC: area under the curve; TPR: true positive rate; FPR: false positive rate.

Table 1.

Patient characteristics

Variables Osteoporosis (T-score ≤ -2.5) Non-osteoporosis (T-score > -2.5) p-value
Number of patients 124  130
Sex
 Female 110 (88.7) 79 (60.8)
 Male 14 (11.3) 51 (39.2)
Age (years) 80.4 ± 9.6 74.1 ± 11.0 <0.01
BMI (kg/m2) 23.0 ± 3.1 24.8 ± 3.9 <0.01
BMD -3.31 ± 0.63 0.32 ± 1.12 <0.025

The data is presented as number (%) or mean ± standard deviation.

BMI: body mass index; BMD: bone mineral density.

Table 2.

Model training results

Accuracy Validation accuracy
VGG19 0.90 0.78
VGG16 0.73 0.77
ResNet50 0.95 0.67
Xception 0.80 0.77

VGG: visual geometry group.

Table 3.

Model evaluation in the test dataset

Accuracy Precision Recall F1-score
VGG19 0.65 0.73 0.79 0.76
VGG16 0.69 0.73 0.79 0.76
ResNet50 0.82 0.80 0.86 0.83
Xception 0.63 0.70 1.0 0.82

VGG: visual geometry group.

Table 4.

Various studies on osteoporosis prediction using deep learning

References Dataset image Algorithm Model evaluation Note
Jang et al.5) Dental (n=800) DNN based on VGG16 AUC = 0.70
Zhang et al.18) Lumbar (anterior & posterior) (n=1,616) DNN AUC = 0.767
Lee et al.7) Dental panoramic (n=680) VGG16 AUC = 0.858 Transfer learning
Sukegawa et al.16) Dental panoramic (n=778) Efficient Net, ResNet18, 50, 152 AUC = 0.911 Ensemble model

AUC: area under the curve; DNN: deep neural network; VGG: visual geometry group.