Home News Ai Models Surpass Physicians In Emergency Room Diagnostic Accuracy Harvard Study Finds

AI Models Surpass Physicians in Emergency Room Diagnostic Accuracy, Harvard Study Finds

By: Harsh Vardhan

Updated on: 04-May-2026

1,988 views

Artificial intelligence models are now matching or surpassing human doctors in clinical reasoning tasks, according to a new study. Researchers from Harvard Medical School and Beth Israel Deaconess Medical Center conducted the largest study to date comparing AI and physicians across a range of medical decision-making tasks. The study aimed to determine if an AI system could perform the daily responsibilities of physicians.

Key Highlights

Harvard-led study compared AI models and physicians in emergency room decision-making tasks.
OpenAI’s o1 model matched or outperformed attending physicians in diagnostic accuracy.
AI models used only raw electronic medical record data for diagnosis.
Researchers caution AI is not ready to replace doctors or practice medicine autonomously.

Study Overview and Key Findings

The research team tested large language models (LLMs) against physicians in emergency room scenarios. They used 76 real emergency room cases from Beth Israel Deaconess Medical Center. The cases required decisions such as prioritizing care and determining ICU admissions. Two attending physicians provided diagnoses, which were then compared to those generated by OpenAI’s o1 and 4o models.

Two additional attending physicians, unaware of the source of each diagnosis, evaluated the results. At every stage of the emergency room diagnosis process, the o1 model performed as well as or better than the two attending physicians and the 4o model. The AI models received only the information available in the electronic medical records at the time of each diagnosis. The researchers did not pre-process the data before inputting it into the models.

Implications and Limitations

The study found that at early decision points in real-world emergency department cases, the AI model matched or exceeded attending physicians in diagnostic accuracy. This result surprised the researchers, who initially doubted the model’s capabilities. However, the study does not suggest that AI can replace doctors or practice medicine independently.

Researchers emphasized that AI should be evaluated as a new medical intervention through carefully controlled clinical trials in real care settings. They noted that while a model might correctly identify the top diagnosis, it could also suggest unnecessary tests that may expose patients to harm. Human oversight remains essential for evaluating performance and ensuring patient safety.

Research Team and Future Directions

The study was led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. Co-senior author Arjun (Raj) Manrai, assistant professor of biomedical informatics at the Blavatnik Institute at HMS, stated that the AI model outperformed both prior models and physician baselines. Co-senior author Adam Rodman, HMS assistant professor of medicine at Beth Israel Deaconess, expressed his initial skepticism but acknowledged the model’s strong performance.

The findings highlight the potential for AI to support clinical decision-making. However, researchers stress the need for further studies to assess AI’s safety and effectiveness in real-world medical settings.