Abstract
BACKGROUND CONTEXT
Computer-aided diagnosis with artificial intelligence (AI) has been used clinically,
and ground truth generalizability is important for AI performance in medical image
analyses. The AI model was trained on one specific group of older adults (aged≧60)
has not yet been shown to work equally well in a younger adult group (aged 18–59).
PURPOSE
To compare the performance of the developed AI model with ensemble method trained
with the ground truth for those aged 60 years or older in identifying vertebral fractures
(VFs) on plain lateral radiographs of spine (PLRS) between younger and older adult
populations.
STUDY DESIGN/SETTING
Retrospective analysis of PLRS in a single medical institution.
OUTCOME MEASURES
Accuracy, sensitivity, specificity, and interobserver reliability (kappa value) were
used to compare diagnostic performance of the AI model and subspecialists’ consensus
between the two groups.
METHODS
Between January 2016 and December 2018, the ground truth of 941 patients (one PLRS
per person) aged 60 years and older with 1101 VFs and 6358 normal vertebrae was used
to set up the AI model. The framework of the developed AI model includes: object detection
with You Only Look Once Version 3 (YOLOv3) at T0-L5 levels in the PLRS, data pre-preprocessing
with image-size and quality processing, and AI ensemble model (ResNet34, DenseNet121,
and DenseNet201) for identifying or grading VFs. The reported overall accuracy, sensitivity
and specificity were 92%, 91% and 93%, respectively, and external validation was also
performed. Thereafter, patients diagnosed as VFs and treated in our institution during
October 2019 to August 2020 were the study group regardless of age. In total, 258
patients (339 VFs and 1725 normal vertebrae) in the older adult population (mean age
78±10.4; range, 60–106) were enrolled. In the younger adult population (mean age 36±9.43;
range, 20–49), 106 patients (120 VFs and 728 normal vertebrae) were enrolled. After
identification and grading of VFs based on the Genant method with consensus between
two subspecialists’, VFs in each PLRS with human labels were defined as the testing
dataset. The corresponding CT or MRI scan was used for labeling in the PLRS. The bootstrap
method was applied to the testing dataset.
RESULTS
The model for clinical application, Digital Imaging and Communications in Medicine
(DICOM) format, is uploaded directly (available at: http://140.113.114.104/vght_demo/svf-model
(grading) and http://140.113.114.104/vght demo/svf-model2 (labeling). Overall accuracy,
sensitivity and specificity in the older adult population were 93.36% (95% CI 93.34%–93.38%),
88.97% (95% CI 88.59%–88.99%) and 94.26% (95% CI 94.23%–94.29%), respectively. Overall
accuracy, sensitivity and specificity in the younger adult population were 93.75%
(95% CI 93.7%–93.8%), 65.00% (95% CI 64.33%–65.67%) and 98.49% (95% CI 98.45%–98.52%),
respectively. Accuracy reached 100% in VFs grading once the VFs were labeled accurately.
The unique pattern of limbus-like VFs, 43 (35.8%) were investigated only in the younger
adult population. If limbus-like VFs from the dataset were not included, the accuracy
increased from 93.75% (95% CI 93.70%–93.80%) to 95.78% (95% CI 95.73%–95.82%), sensitivity
increased from 65.00% (95% CI 64.33%–65.67%) to 70.13% (95% CI 68.98%–71.27%) and
specificity remained unchanged at 98.49% (95% CI 98.45%–98.52%), respectively. The
main causes of false negative results in older adults were patients’ lung markings,
diaphragm or bowel airs (37%, n=14) followed by type I fracture (29%, n=11). The main
causes of false negatives in younger adults were limbus-like VFs (45%, n=19), followed
by type I fracture (26%, n=11). The overall kappa between AI discrimination and subspecialists’
consensus in the older and younger adult populations were 0.77 (95% CI, 0.733–0.805)
and 0.72 (95% CI, 0.6524–0.80), respectively.
CONCLUSIONS
The developed VF-identifying AI ensemble model based on ground truth of older adults
achieved better performance in identifying VFs in older adults and non-fractured thoracic
and lumbar vertebrae in the younger adults. Different age distribution may have potential
disease diversity and implicate the effect of ground truth generalizability on the
AI model performance.
Keywords
To read this article in full you will need to make a payment
Purchase one-time access:
Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online accessOne-time access price info
- For academic or personal research use, select 'Academic and Personal'
- For corporate R&D use, select 'Corporate R&D Professionals'
Subscribe:
Subscribe to The Spine JournalAlready a print subscriber? Claim online access
Already an online subscriber? Sign in
Register: Create an account
Institutional Access: Sign in to ScienceDirect
References
- Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs.Eur Radiol. 2019; 29: 5469-5477
- Artificial intelligence detection of distal radius fractures: a comparison between the convolutional neural network and professional assessments.Acta Orthop. 2019; 90: 394-400
- Automated detection and classification of the proximal humerus fracture by using deep learning algorithm.Acta Orthop. 2018; 89: 468-473
- Artificial intelligence for the detection of vertebral fractures on plain spinal radiography.Sci Rep. 2020; 10: 20031
- Can a deep-learning model for the automated detection of vertebral fractures approach the performance level of human subspecialists?.Clin Orthop Relat Res. 2021; 479: 1598-1612
- Contribution of vertebral deformities to chronic back pain and disability. The Study of Osteoporotic Fractures Research Group.Journal of bone and mineral research: the official journal of the American Society for Bone and Mineral Research. 1992; 7: 449-456
- Vertebral fractures: clinical importance and management.Am J Med. 2016; 129 (221.e1-10)
- Fracture incidence and characteristics in young adults aged 18 to 49 years: a population-based study.Journal of bone and mineral research: the official journal of the American Society for Bone and Mineral Research. 2017; 32: 2347-2354
- Integrating artificial intelligence into the clinical practice of radiology: challenges and recommendations.Eur Radiol. 2020; 30: 3576-3584
- Vertebral fracture assessment using a semiquantitative technique.Journal of bone and mineral research: the official journal of the American Society for Bone and Mineral Research. 1993; 8: 1137-1148
- Thoracolumbar burst fractures. The clinical efficacy and outcome of nonoperative management.Spine (Phila Pa 1976). 1993; 18: 955-970
- Is removal of the implants needed after fixation of burst fractures of the thoracolumbar and lumbar spine without fusion? A retrospective evaluation of radiological and functional outcomes.Bone Joint J. 2016; 98-b: 109-116
- A survey on transfer learning.IEEE Transactions on knowledge and data engineering. 2009; 22: 1345-1359
- An introduction to the bootstrap.An introduction to the bootstrap. CHAPMAN&HALL/CRC, New York1994
- Giant limbus vertebra mimicking a vertebral fracture.QJM. 2014; 107: 937-938
- Vertebral body compression fractures and bone density: automated detection and classification on CT images.Radiology. 2017; 284: 788-797
- Lateral vertebral assessment: a valuable technique to detect clinically significant vertebral fractures.Osteoporosis international: a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA. 2005; 16: 1513-1518
- Clinical utility of dual-energy vertebral assessment (DVA).Osteoporosis international: a journal established as result of cooperation between the European Foundation for Osteoporosis and the National Osteoporosis Foundation of the USA. 2003; 14: 871-878
- Assessment of osteoporotic vertebral fractures using specialized workflow software for 6-point morphometry.Eur J Radiol. 2009; 70: 142-148
- Vertebral fracture.Radiologic Clinics of North America. 2010; 48: 519-529
- Osteoporotic vertebral endplate and cortex fractures: a pictorial review.J Orthop Translat. 2018; 15: 35-49
- Deep learning.Nature. 2015; 521: 436-444
- New fractures after vertebroplasty: adjacent fractures occur significantly sooner.AJNR American Journal of Neuroradiology. 2006; 27: 217-223
- Friend or foe: high bone mineral density on routine bone density scanning, a review of causes and management.Rheumatology (Oxford, England). 2013; 52: 968-985
- Artificial intelligence in radiology.Nat Rev Cancer. 2018; 18: 500-510
Article info
Publication history
Published online: November 01, 2021
Accepted:
October 25,
2021
Received in revised form:
September 23,
2021
Received:
July 7,
2021
Footnotes
FDA device/drug status: Not applicable
Author disclosures: PHC: Nothing to disclose. THTJ: Nothing to disclose. HTHW: Nothing to disclose. YCY: Nothing to disclose. HHL: Nothing to disclose. MCC: Nothing to disclose. STW: Nothing to disclose. HHSL: Nothing to disclose. HHC: Nothing to disclose.
Identification
Copyright
© 2021 Elsevier Inc. All rights reserved.