Examining Diagnostic Capabilities: GPT-4 vs Radiologists in Musculoskeletal Imaging

A recent study examining how artificial intelligence (AI) is changing healthcare, especially in diagnostic imaging, investigated and compared the diagnostic accuracy of two advanced artificial intelligence models, specifically GPT-4 based ChatGPT and GPT-4V based ChatGPT.

The study made use of  106 “Test Yourself” cases from January 2014 to September 2023. Every case was assessed by feeding results of imaging and medical history into GPT-4 based ChatGPT, while medical history and images were sent to GPT-4V based ChatGPT. For every case, the two AI models produced a diagnosis. At the same time all cases were independently assessed by two radiologists: a board-certified, experienced radiologist and a radiology resident. A statistical analysis employing Chi-square tests was utilized to evaluate the diagnosis accuracy of the AI models with human radiologists. The rates of diagnostic accuracy were established by utilizing the published ground truth.

GPT-4 based ChatGPT outperformed GPT-4V based ChatGPT, which only obtained 8% (9/106) accuracy (p < 0.001), with an amazing diagnostic accuracy rate of 43% (46/106). The board-certified radiologist had the best accuracy rate among the radiologists, coming up at 53% (56/106), while the radiology resident scored 41% (43/106). There was no statistically significant difference in the diagnostic accuracy between ChatGPT, which is based on GPT-4, and the radiology resident (p = 0.78). Though the difference was not statistically significant (p = 0.22), GPT-4 based ChatGPT was unable to match the board-certified radiologist’s diagnostic performance level. Interestingly, ChatGPT, which is based on GPT-4V, showed considerably poorer diagnostic accuracy than both radiologists (p < 0.001).

It is important to note that this study has not undergone peer review. The results indicate that GPT-4 based ChatGPT holds potential, but it requires further examination and validation before it can be implemented in clinical settings. These findings shed light on the ever-changing role of AI in diagnostic imaging, underscoring the importance of thorough evaluation and continuous research in this domain.

Even if the study offers fascinating new information, it is important to approach the subject of AI in healthcare with some skepticism. Concerns regarding the complexity and sensitivities of medical practice are warranted when considering the idea of AI completely replacing doctors. The use of AI should be seen as a useful tool that enhances physicians’ talents rather than replacing specialties. The ultimate goal should be on utilizing AI as a complimentary tool in the hands of qualified healthcare professionals as we navigate these advancements, since this will help to provide patients with safer and more accurate care.

Journal Reference

 

Daisuke Horiuchi, Hiroyuki Tatekawa, Tatsushi Oura, Taro Shimono, Shannon L Walston, Hirotaka Takita, Shu Matsushita, Yasuhito Mitsuyama, Yukio Miki, Daiju Ueda; Comparison of the diagnostic accuracy among GPT-4 based ChatGPT, GPT-4V based ChatGPT, and radiologists in musculoskeletal radiology; medRxiv 2023.12.07.23299707; doi: https://doi.org/10.1101/2023.12.07.23299707

Leave a comment