GPT-4 generates accurate and readable patient education materials aligned with current oncological guidelines: A randomized assessment.
Objective: Guideline-based patient educational materials (PEMs) empower patients and reduce misinformation, but require frequent updates and must be adapted to the readability level of patients. The aim is to assess whether generative artificial intelligence (GenAI) can provide readable, accurate, and up-to-date PEMs that can be subsequently translated into multiple languages for broad dissemination.
Methods: The European Association of Urology (EAU) guidelines for prostate, bladder, kidney, and testicular cancer were used as the knowledge base for GPT-4 to generate PEMs. Additionally, the PEMs were translated into five commonly spoken languages within the European Union (EU). The study was conducted through a single-blinded, online randomized assessment survey. After an initial pilot assessment of the GenAI-generated PEMs, thirty-two members of the Young Academic Urologists (YAU) groups evaluated the accuracy, completeness, and clarity of the original versus GPT-generated PEMs. The translation assessment involved two native speakers from different YAU groups for each language: Dutch, French, German, Italian, and Spanish. The primary outcomes were readability, accuracy, completeness, faithfulness, and clarity. Readability was measured using Flesch Kincaid Reading Ease (FKRE), Flesch Kincaid Grade Level (FKGL), Gunning Fog (GFS) scores and Smog (SI), Coleman Liau (CLI), Automated Readability (ARI) indexes. Accuracy, completeness, faithfulness, and clarity were rated on a 5-item Likert scale.
Results: The mean time to create layperson PEMs based on the latest guideline by GPT-4 was 52.1 seconds. The readability scores for the 8 original PEMs were lower than for the 8 GPT-4-generated PEMs (Mean FKRE: 43.5 vs. 70.8; p < .001). The required reading education levels were higher for original PEMs compared to GPT-4 generated PEMs (Mean FKGL: 11.6 vs. 6.1; p < .001). For all urological localized cancers, the original PEMs were not significantly different from the GPT-4 generated PEMs in accuracy, completeness, and clarity. Similarly, no differences were observed for metastatic cancers. Translations of GPT-generated PEMs were rated as faithful in 77.5% of cases and clear in 67.5% of cases.
Conclusions: GPT-4 generated PEMs have better readability levels compared to original PEMs while maintaining similar accuracy, completeness, and clarity. The use of GenAI's information extraction and language capabilities, integrated with human oversight, can significantly reduce the workload and ensure up-to-date and accurate PEMs. Results: Some cancer facts made for patients can be hard to read or not in the right words for those with prostate, bladder, kidney, or testicular cancer. This study used AI to quickly make short and easy-to-read content from trusted facts. Doctors checked the AI content and found that they were just as accurate, complete, and clear as the original text made for patients. They also worked well in many languages. This AI tool can assist providers in making it easier for patients to understand their cancer and the best care they can get.