GeneralAI / InformaticsResearchTrainee

DeepSeek-R1 edges out ChatGPT-o1 on simulated radiology board-style questions, but image-based ratings lag

Radiology AI literature (PubMed)3d ago

On 27 radiology questions, DeepSeek-R1 scored higher than ChatGPT-o1 (mean 4.51 vs 3.73 on a 5-point scale, P<.001). When ChatGPT-o1 answered image-based questions, residents rated it lower than its own text answers, particularly for factual accuracy (mean 2.75). Both models sho…

Design: 7 radiology residents (PGY2–5) rated LLM answers to 27 text questions and 6 image questions across 9 criteria grouped into accuracy, practicality, and didactic value.
Junior residents gave ChatGPT-o1 higher overall scores than seniors (3.81 vs 3.63, P=.02), but no significant experience-level differences emerged for DeepSeek-R1.
Single simulated question set, lack of real-world clinical integration, and no external or prospective validation limit generalizability.

Read the source

RadPigeon summaries are original and for information only. They are not clinical advice.