A new study suggests large language models (LLMs) like GPT-4 may have a future in ophthalmology, but limitations and risks remain. Researchers from Cambridge University tested GPT-4, along with other LLMs, against human ophthalmologists on a mock exam.

GPT-4 answered 60 out of 87 questions correctly in the exam

The results were intriguing. GPT-4 answered 60 out of 87 questions correctly, exceeding the performance of trainee doctors (average: 59.7) and junior doctors (average: 37). However, it fell short of the average score achieved by expert ophthalmologists (66.4). Other LLMs, like PaLM 2 and GPT-3.5, performed less impressively.

ChatGPT

While these findings hint at potential benefits, researchers highlight significant risks. The study’s limited question pool raises concerns about generalizability. More importantly, LLMs are prone to “hallucinating,” fabricating information that could lead to misdiagnosis of serious conditions like cataracts or cancer. Additionally, the lack of nuance inherent in LLMs could exacerbate inaccuracies.

The study clearly emphasizes the need for further research and development before LLMs can be considered reliable tools for medical diagnosis. Since there is a lot of risk involved in anything concerning medical diagnoses, we might have to wait for a long time before LLMs are incorporated in mainstream medical situations.

RELATED:

(Via)