Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5.

Journal: Medical Teacher
Published:
Abstract

Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematically investigated. Curated prompts were created to generate multiple-choice questions matching the USMLE Step 1 examination style. We utilized ChatGPT-3.5 to generate 50 questions and answers based upon each prompt style. We manually examined output for factual accuracy, Bloom's Taxonomy, and category within the USMLE Step 1 content outline. ChatGPT-3.5 generated 150 multiple-choice case-style questions and selected an answer. Overall, 83% of generated multiple questions had no factual inaccuracies and 15% contained one to two factual inaccuracies. With simple prompting, common themes included deep venous thrombosis, myocardial infarction, and thyroid disease. Topic diversity improved by separating content topic generation from question generation, and specificity to Step 1 increased by indicating that "treatment" questions were not desired. We demonstrate that ChatGPT-3.5 can successfully generate Step 1 style questions with reasonable factual accuracy, and this method may be used by medical students preparing for USMLE examinations. While AI-generated questions demonstrated adequate factual accuracy, targeted prompting techniques should be used to overcome ChatGPT's bias towards particular medical conditions.

Authors
Alan Balu, Stefan Prvulovic, Claudia Fernandez Perez, Alexander Kim, Daniel Donoho, Gregory Keating