GeneralAI / InformaticsResearchTrainee

LLMs Ace Radiology MCQs: Gemini 2.5 Pro at 90%, Claude Most Reliable

Radiology AI literature (PubMed)3d ago

In a 100-question radiology MCQ test, Gemini 2.5 Pro achieved 90% accuracy, with Claude 4.5 Sonnet at 86%. All LLMs and 3rd-year residents outperformed juniors. No performance gap between Turkish and English (P=1.000). Claude showed best temporal reliability (κ=0.872).

Cross-sectional study of 4 LLMs on 100 MCQs (5 subspecialties), benchmarked against 18 radiology residents (years 1–3).
Claude 4.5 Sonnet demonstrated superior temporal consistency (κ=0.872), while Grok-4 and ChatGPT-5 were moderately reliable.
Text-only design; performance on imaging-based questions and real-world clinical tasks remains unvalidated.

Read the source

RadPigeon summaries are original and for information only. They are not clinical advice.