Considerations for improving generalizability and robustness of predictive models in perihilar cholangiocarcinoma
Letter to the Editor

Considerations for improving generalizability and robustness of predictive models in perihilar cholangiocarcinoma

Zhanna Zhang, Gongqiang Wu ORCID logo

Department of Hematology, Dongyang Hospital Affiliated to Wenzhou Medical University, Jinhua, China

Correspondence to: Gongqiang Wu, MD, PhD. Department of Hematology, Dongyang Hospital Affiliated to Wenzhou Medical University, 60 Wuning West Road, Jinhua 322100, China. Email: wugongqiang59@126.com.

Comment on: Kawashima J, Endo Y, Rashid Z, et al. Predictive model for very early recurrence of patients with perihilar cholangiocarcinoma: a machine learning approach. Hepatobiliary Surg Nutr 2025;14:3-15.


Submitted Apr 10, 2025. Accepted for publication Aug 17, 2025. Published online Jan 15, 2026.

doi: 10.21037/hbsn-2025-219


We read with great interest the article by Kawashima et al., “Predictive model for very early recurrence of patients with perihilar cholangiocarcinoma: A machine learning approach”, recently published in HepatoBiliary Surgery and Nutrition (1). The authors should be commended for developing a clinically relevant machine learning (ML)-based prediction model and incorporating SHapley Additive exPlanations (SHAP) analysis to enhance model interpretability. Their multi-center approach, spanning over two decades, represents an important effort to improve individualized risk assessment in this rare but aggressive malignancy. As researchers with experience in clinical prediction modeling, we greatly appreciate the challenges involved in developing and validating such models. However, after carefully reviewing the manuscript and supplementary materials, we would like to raise a few methodological considerations that could further strengthen the study’s rigor and real-world applicability.

First, while the authors conducted internal validation using bootstrapping (n=5,000), it is unclear whether they partitioned the dataset into independent training and test sets. Recent literature has consistently emphasized the importance of external validation to assess a model’s generalizability, particularly when multi-center data are involved (2,3). In our own experience, models that appear well-calibrated in internal validation often perform suboptimally in external datasets due to shifts in patient demographics and clinical practices (4). Given the long study period and potential heterogeneity in clinical management, incorporating a truly external validation cohort would provide a more robust assessment of the model’s stability (5,6).

Second, the selection process for the 13 candidate variables could benefit from further clarification. While the chosen predictors align with prior literature, it remains unclear whether a systematic feature selection approach was used before model construction. As we have encountered in our own modeling work, traditional stepwise selection methods can introduce instability, especially in datasets with low event-to-variable ratios (7). Methods such as least absolute shrinkage and selection operator (LASSO) regression, Boruta, or domain-informed selection are increasingly preferred for optimizing predictor inclusion while minimizing overfitting. Given that the study includes only 65 recurrence events, outlining the variable selection methodology would reassure readers that the final model is not unduly influenced by noise or collinearity (8,9).

Third, while the authors constructed two models—one incorporating both preoperative and postoperative variables, and another based solely on preoperative variables—the study does not compare XGBoost’s performance with alternative ML methods, such as random forests, support vector machines, or deep learning architectures. While XGBoost has demonstrated strong predictive performance in many clinical settings, no single algorithm consistently outperforms others across all datasets (10). A comparative analysis, potentially integrating ensemble learning or calibration assessment, could offer further insights into whether XGBoost is indeed the optimal choice for this clinical scenario (11).

In summary, Kawashima et al. have contributed meaningfully to the field of ML in oncology. Addressing these methodological points—explicitly clarifying the dataset partitioning strategy, detailing the feature selection process, and comparing alternative ML approaches—would further enhance the study’s robustness and translational relevance. We appreciate the opportunity to engage in this academic discussion and believe these refinements could strengthen future predictive modeling research in perihilar cholangiocarcinoma.


Acknowledgments

None.


Footnote

Provenance and Peer Review: This article was a standard submission to the journal. The article did not undergo external peer review.

Funding: None.

Conflicts of Interest: Both authors have completed the ICMJE uniform disclosure form (available at https://hbsn.amegroups.com/article/view/10.21037/hbsn-2025-219/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Kawashima J, Endo Y, Rashid Z, et al. Predictive model for very early recurrence of patients with perihilar cholangiocarcinoma: a machine learning approach. Hepatobiliary Surg Nutr 2025;14:3-15. [Crossref] [PubMed]
  2. Wolff RF, Moons KGM, Riley RD, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Ann Intern Med 2019;170:51-8. [Crossref] [PubMed]
  3. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. [Crossref] [PubMed]
  4. Riley RD, Archer L, Snell KIE, et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 2024;384:e074820. [Crossref] [PubMed]
  5. Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020;368:m441. [Crossref] [PubMed]
  6. Collins GS, Dhiman P, Ma J, et al. Evaluation of clinical prediction models (part 1): from development to external validation. BMJ 2024;384:e074819. [Crossref] [PubMed]
  7. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 2007;165:710-8. [Crossref] [PubMed]
  8. Andaur Navarro CL, Damen JAA, Takada T, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 2021;375: [Crossref] [PubMed]
  9. Efthimiou O, Seo M, Chalkou K, et al. Developing clinical prediction models: a step-by-step guide. BMJ 2024;386:e078276. [Crossref] [PubMed]
  10. Christodoulou E, Ma J, Collins GS, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12-22. [Crossref] [PubMed]
  11. Van Calster B, Wynants L, Verbeek JFM, et al. Reporting and Interpreting Decision Curve Analysis: A Guide for Investigators. Eur Urol 2018;74:796-804. [Crossref] [PubMed]
Cite this article as: Zhang Z, Wu G. Considerations for improving generalizability and robustness of predictive models in perihilar cholangiocarcinoma. Hepatobiliary Surg Nutr 2026;15(1):19. doi: 10.21037/hbsn-2025-219

Download Citation