

浏览全部资源
扫码关注微信
宁德师范学院附属宁德市医院临床药学室,福建 宁德 352100
Received:21 April 2025,
Revised:2025-08-01,
Accepted:04 August 2025,
Published:15 October 2025
移动端阅览
林小惠,汪余嘉,张玲玲,等.基于MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预测模型构建[J].中国药房,2025,36(19):2448-2453.
LIN Xiaohui,WANG Yujia,ZHANG Lingling,et al.Construction of machine learning classification prediction model for vancomycin blood concentrations based on MIMIC-Ⅳ database[J].ZHONGGUO YAOFANG,2025,36(19):2448-2453.
林小惠,汪余嘉,张玲玲,等.基于MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预测模型构建[J].中国药房,2025,36(19):2448-2453. DOI: 10.6039/j.issn.1001-0408.2025.19.16.
LIN Xiaohui,WANG Yujia,ZHANG Lingling,et al.Construction of machine learning classification prediction model for vancomycin blood concentrations based on MIMIC-Ⅳ database[J].ZHONGGUO YAOFANG,2025,36(19):2448-2453. DOI: 10.6039/j.issn.1001-0408.2025.19.16.
目的
2
构建万古霉素血药谷浓度的分类预测模型,优化其精准用药策略。
方法
2
从重症监护医学信息集市数据库中筛选符合条件的患者数据,经过数据清洗和预处理,最终纳入9 902例患者,结合相关性分析和Boruta特征选择算法进行特征选择,根据临床治疗窗标准离散化万古霉素血药谷浓度结果为低浓度(<10 μg/mL)、中浓度(10~20 μg/mL)和高浓度(≥20 μg/mL)。采用6种机器学习算法:表格先验数据拟合网络(TabPFN)、逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)、支持向量机(SVM)、K近邻(KNN)构建分类模型,通过10折交叉验证(10-CV)评估模型性能,主要性能评估指标包括准确率、平衡准确率、宏平均精确率、宏平均召回率、宏平均F1、多类ROC曲线的曲线下面积(OvR-AUC)。采用沙普利加性解释(SHAP)分析不同特征对模型预测结果的影响方向与强度。
结果
2
RF和TabPFN模型表现最优(准确率为0.741 4和0.737 7,OvR-AUC为0.907 0和0.895 8),XGBoost模型表现中等,而LR、SVM和KNN模型的性能较差。混淆矩阵热力图显示,RF和TabPFN模型在高浓度类别上的预测准确率较高,但在低、中浓度类别上的表观略显不足。自举法结合10-CV评估显示,RF模型各项性能评价指标表现稳定(准确率0.741 4,平衡准确率0.740 3,宏平均精确率0.732 1,宏平均召回率0.736 0,宏平均F1 0.736 0,OvR-AUC 0.907 0),具备良好的分类性能与判断能力。SHAP法分析发现,肌酐、尿素氮及万古霉素日累计量和给药频率等关键特征对预测结果具有显著影响。
结论
2
RF和TabPFN模型在万古霉素血药谷浓度分类预测任务中表现出一定优势,在低、中浓度类别上的表现仍有改进空间。
OBJECTIVE
2
To construct a classification prediction model for vancomycin blood concentration, and to optimize its precision dosing strategies.
METHODS
2
Patient records meeting inclusion criteria were extracted from the Medical Information Mart for Intensive Care database. Following data cleaning and preprocessing, a final cohort of 9 902 patient was analyzed. Feature selection was performed through correlation analysis and the Boruta feature selection algorithm. Vancomycin blood concentrations were discretized into three categories based on clinical therapeutic windows: low (<10 μg/mL), intermediate (10-20 μg/mL), and high (≥20 μg/mL). Six machine learning algorithms were employed to construct classification models: tabular prior-data fitted network (TabPFN), logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM), K-nearest neighbors (KNN). Model performance was evaluated using 10-fold cross-validation (10-CV), with primary metrics including: accuracy, balanced accuracy, precision macro, recall macro, macro F1, area under the receiver operating characteristic curve (OvR-AUC). Shapley Additive Explanations (SHAP) was adopted to analyze the direction and magnitude of the impact that different features had on the model’s predictive outcomes.
RESULTS
2
The results showed that the RF and TabPFN models performed the best (with accuracy of 0.741 4 and 0.737 7, and OvR-AUC of 0.907 0 and 0.895 8, respectively). XGBoost model exhibited moderate performance, while LR, SVM, and KNN models demonstrated relatively poor performance. Confusion matrix heatmap analysis revealed that both RF and TabPFN achieved higher accuracy in predicting high-concentration cases but exhibited slightly lower performance in the low and medium concentration categories. Bootstrap with 10-CV revealed that the RF model demonstrated stable performance across various evaluation metrics (accuracy: 0.741 4; balanced accuracy: 0.740 3; precision macro: 0.732 1; recall macro: 0.736 0; macro F1: 0.736 0; OvR-AUC: 0.907 0), indicating good classification performance and generalization ability. SHAP analysis revealed that creatinine, urea nitrogen, daily cumulative dose and administration frequency of vancomycin, which were key predictors, had a significant impact on the prediction results.
CONCLUSIONS
2
RF and TabPFN models demonstrate certain advantages in the classification prediction of vancomycin trough blood concentrations; however, their performance in the low to moderate concentration categories still requires improvement.
WILLIAMS P G , TABAH A , COTTA M O , et al . International survey of antibiotic dosing and monitoring in adult intensive care units [J ] . Crit Care , 2023 , 27 ( 1 ): 241 .
LIM A S , FOO S H W , BENJAMIN SENG J J , et al . Area-under-curve-guided versus trough-guided monitoring of vancomycin and its impact on nephrotoxicity:a systematic review and meta-analysis [J ] . Ther Drug Monit , 2023 , 45 ( 4 ): 519 - 532 .
ANGELOPOULOS A N , BATES S , FANNJIANG C , et al . Prediction-powered inference [J ] . Science , 2023 , 382 ( 6671 ): 669 - 674 .
HUANG X H , YU Z , BU S H , et al . An ensemble model for prediction of vancomycin trough concentrations in pedia- tric patients [J ] . Drug Des Devel Ther , 2021 , 15 : 1549 - 1559 .
OTA R , YAMASHITA F . Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics [J ] . J Control Release , 2022 , 352 : 961 - 969 .
HOLLMANN N , MÜLLER S , PURUCKER L , et al . Accurate predictions on small data with a tabular foundation model [J ] . Nature , 2025 , 637 ( 8045 ): 319 - 326 .
JOHNSON A E W , BULGARELLI L , SHEN L , et al . MIMIC-Ⅳ,a freely accessible electronic health record dataset [J ] . Sci Data , 2023 , 10 ( 1 ): 1 .
MRAMBA L K , LIU X , LYNCH K F , et al . Detecting potential outliers in longitudinal data with time-dependent covariates [J ] . Eur J Clin Nutr , 2024 , 78 ( 4 ): 344 - 350 .
YAN F J , CHEN X H , QUAN X Q , et al . Association between the stress hyperglycemia ratio and 28-day all-cause mortality in critically ill patients with sepsis:a retrospective cohort study and predictive model establishment based on machine learning [J ] . Cardiovasc Diabetol , 2024 , 23 ( 1 ): 163 .
PRIYADHARSHINI M , BANU A F , SHARMA B , et al . Hybrid multi-label classification model for medical applications based on adaptive synthetic data and ensemble learning [J ] . Sensors , 2023 , 23 ( 15 ): 6836 .
CHICCO D , JURMAN G . The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J ] . BMC Genomics , 2020 , 21 ( 1 ): 6 .
SWANSON K , WU E , ZHANG A , et al . From patterns to patients:advances in clinical machine learning for cancer diagnosis,prognosis,and treatment [J ] . Cell , 2023 , 186 ( 8 ): 1772 - 1791 .
TRAN V Q , BYEON H . Predicting dementia in Parkinson’s disease on a small tabular dataset using hybrid LightGBM-TabPFN and SHAP [J ] . Digit Health , 2024 , 10 : 1 - 15 .
HUI K H M , LUI C Y G , WU K L A , et al . Multi-center prospective population pharmacokinetic study and the performance of web-based individual dose optimization application of intravenous vancomycin for adults in Hong Kong:a study protocol [J ] . PLoS One , 2022 , 17 ( 5 ): e0267894 .
PEREIRA T , ABBASI M , OLIVEIRA J L , et al . Optimi- zing blood-brain barrier permeation through deep reinforcement learning for de novo drug design [J ] . Bioinformatics , 2021 , 37 ( Suppl. 1 ): i84 - i92 .
MBIZVO G K , LARNER A J . Receiver operating characteristic plot and area under the curve with binary classifiers:pragmatic analysis of cognitive screening instruments [J ] . Neurodegener Dis Manag , 2021 , 11 ( 5 ): 353 - 360 .
HUANG X H , YU Z , WEI X , et al . Prediction of vancomycin dose on high-dimensional data using machine lear- ning techniques [J ] . Expert Rev Clin Pharmacol , 2021 , 14 ( 6 ): 761 - 771 .
YIN M H , JIANG Y L , YUAN Y W , et al . Optimizing vancomycin dosing in pediatrics:a machine learning approach to predict trough concentrations in children under four years of age [J ] . Int J Clin Pharm , 2024 , 46 ( 5 ): 1134 - 1142 .
ABBAS Q , DAADAA Y , RASHID U , et al . HDR-EfficientNet:a classification of hypertensive and diabetic retinopathy using optimize EfficientNet architecture [J ] . Diagnostics , 2023 , 13 ( 20 ): 3236 .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
公网安备50010302001817