[关键词]
[摘要]
目的 评价国内5种中文大语言模型(large language models,LLMs)在乳腺癌相关淋巴水肿常见问题问答中的综合表现,为其应用及优化提供依据。方法 基于LLMs、小组讨论和专家意见确定100个乳腺癌相关淋巴水肿的常见问题,分别由3名护理硕士生将问题输入5种LLMs模拟咨询,邀请5位专家从整体质量、准确性、全面性方面评估模型表现,以字符数评价应答的简洁性,分析模型的性能表现。采用组内相关系数(intraclass correlation coefficient,ICC)评价专家间一致性。结果 5位专家评价者间一致性中等( ICC =0.594)。5种LLMs综合表现均较好,“豆包”的整体质量和准确性评分均高于其他模型,差异有统计学意义(均 P <0.05);“豆包”与“通义千问”的全面性评分差异无统计学意义( P >0.05);二者评分均高于其他模型,差异有统计学意义(均 P <0.05);“DeepSeek”和“文心一言”的字符数低于其他模型,差异均有统计学意义(均 P <0.05)。结论 以“豆包”为代表的LLMs在乳腺癌患者淋巴水肿相关护理问答的模拟咨询场景中显示出应用潜力,可进一步评价其在乳腺癌相关淋巴水肿预防管理中的应用效果。
[Key word]
[Abstract]
Objective To evaluate the comprehensive performance of five domestic Chinese Large Language Models (LLMs) in answering frequently asked questions (FAQs) regarding breast cancer-associated lymphedema (BCRL),and to provide a basis for their application and optimization.Methods Based on LLMs,group discussions,and expert opinions,100 FAQs about BCRL were identified.Three students with Master Degree in nursing input these questions into the five models for simulated consultations.Five lymphedema nursing experts were invited to evaluate the model performance in terms of information quality,accuracy,and comprehensiveness.The conciseness of responses was evaluated by word character count.The differences in performance among the models were analyzed.The intraclass correlation coefficient (ICC) was used to evaluate inter-expert consistency.Results The consistency among the five expert evaluators was moderate ( ICC =0.594).The comprehensive performance of the five models was generally good.Doubao received significantly higher scores than the other models in terms of information quality and accuracy ( P <0.05).There was no statistically significant difference in comprehensiveness scores between Doubao and Tongyi Qianwen( P >0.05),but both scored significantly higher than the other models ( P <0.05).DeepSeek and Wenxin Yiyan had significantly lower character counts than the other models (all P <0.05). Conclusions LLMs represented by Doubao show application potential in simulated consultation scenarios for answering nursing questions related to BCRL.Future research should further evaluate their clinical application effects in the preventive management of BCRL.
[中图分类号]
R473.73
[基金项目]
国家自然科学基金面上项目(72174011);天津市医学重点学科(专科)建设项目(TJYXZDXK 011A)