GeneralAI / InformaticsResearchTrainee
LLMs Ace Radiology MCQs: Gemini 2.5 Pro at 90%, Claude Most Reliable
Radiology AI literature (PubMed)3d ago
In a 100-question radiology MCQ test, Gemini 2.5 Pro achieved 90% accuracy, with Claude 4.5 Sonnet at 86%. All LLMs and 3rd-year residents outperformed juniors. No performance gap between Turkish and English (P=1.000). Claude showed best temporal reliability (κ=0.872).
- Cross-sectional study of 4 LLMs on 100 MCQs (5 subspecialties), benchmarked against 18 radiology residents (years 1–3).
- Claude 4.5 Sonnet demonstrated superior temporal consistency (κ=0.872), while Grok-4 and ChatGPT-5 were moderately reliable.
- Text-only design; performance on imaging-based questions and real-world clinical tasks remains unvalidated.
RadPigeon summaries are original and for information only. They are not clinical advice.
