A recent study reveals that large language models (LLMs) can provide treatment recommendations for patients with early-stage hepatocellular carcinoma (HCC) that align with established clinical guidelines. However, the models demonstrate significant limitations when applied to more complex, advanced cases of the disease. The research, led by Ji Won Han from The Catholic University of Korea, has been published in the open-access journal PLOS Medicine.
Determining the most effective treatment for liver cancer is a challenging task. International treatment guidelines offer general recommendations, but clinicians must often adapt these guidelines based on various factors, including the cancer’s stage, liver function, and the presence of other health conditions. To explore the efficacy of LLMs in making treatment recommendations, the research team compared suggestions generated by three LLMs—ChatGPT, Gemini, and Claude—with actual treatments provided to over 13,000 newly diagnosed HCC patients in South Korea.
The findings indicate that in cases of early-stage HCC, a higher alignment between LLM-generated recommendations and actual treatments correlates with improved patient survival rates. Conversely, for patients with advanced-stage HCC, a similar alignment is associated with worse survival outcomes.
The research highlights a significant divergence in focus between LLMs and healthcare professionals. While LLMs tended to prioritize tumor characteristics, such as size and quantity, physicians placed greater emphasis on liver function and patient-specific factors. This discrepancy underscores the potential for LLMs to assist in straightforward treatment decisions for early-stage cases, though their applicability diminishes in more complex scenarios that demand nuanced clinical judgment.
Ji Won Han and colleagues caution that LLM advice should be approached with caution, emphasizing its role as a supportive tool rather than a replacement for clinical expertise. The authors state, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”
These insights mark a significant step in understanding the potential and limitations of artificial intelligence in healthcare. As AI technologies continue to evolve, further research will be essential in determining how best to integrate these tools into clinical practice, particularly in complex cases where human expertise remains paramount.
