Appraisal of the current guidelines for management of cholangiocarcinoma—using the Appraisal of Guidelines Research and Evaluation II (AGREE II) Instrument
Introduction
Cholangiocarcinoma (CC) is a malignant tumour arising from epithelial lining of the biliary system that accounts 3% of all gastrointestinal tumours. According to its location, it can be divided into intrahepatic CC (ICC), which accounts for 20–25%, hilar CC (50–60%), and distal extrahepatic CC (20–25%) (1,2).
It is reported that the incidence rate of all forms of CC is demonstrating an increasing trend (3). However, coding misclassification of Klatskin tumour as ICC may have resulted in a skewed incidence rate by overestimating ICC by 13% and underestimating extrahepatic CC by 15%, (3,4). In 2011, the Institute of Medicine (US) revised the 21-year-old definition of clinical practice guidelines (CPGs) (Institute of Medicine, 1990) as follows, “Clinical practice guidelines are statements that include recommendations intended to optimize patient care that are informed by a systematic review of the evidence and an assessment of the benefits and harms of alternative care options” (5).
The Appraisal of Guidelines for Research and Evaluation (AGREE) II instrument is the latest of more than 40 tools for the appraisal of CPGs (6,7). The AGREE instrument and its further refinements is the only CPG appraisal tool that has been developed and validated internationally, formally endorsed by several organizations including the WHO Advisory Committee on Health Research, and used by many guideline development groups (8,9). Detailed information is available on the AGREE web site (www.agreetrust.org).
The aim of the present study was to evaluate the quality of current CC guidelines, with a primary focus on resection, using the AGREE II instrument. The present study focused on the methodological analysis and did not analyse the recommended practices.
Methods
Study selection and data review extraction
A systematic review of the literature in Cochrane, PubMed, Embase, and Google Scholar (including studies of the last 20 years) was conducted to identify guidelines using Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) criteria (10). References of retrieved articles were also searched manually for further guidelines. After independent evaluation of the CPGs by PG and RS, the following data were extracted: country of origin, year of publication, development and/or revision committee, evaluation measures, and funding sources. Evidenced-based CPGs in the English language pertaining to the resection of CC were included (Figure 1, Tables 1 and 2).
Full table
Full table
Appraisal of guidelines
The AGREE II tool comprises 23 items divided into six domains: scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability, and editorial independence. For further details regarding the criteria used to describe and evaluate the 6 domains and the 23 consisted items, please see Figure S1: AGREE reporting checklist. After undergoing online training (www.agreetrust.org) to ensure appraisal standardisation, four appraisers (PG, AA, RS, KR), as recommended by the AGREE II consortium, evaluated the guidelines, independently using the AGREE II tool (September 2013 version). As per the AGREE II manual, discrepancies of more than 2 standard deviations (SDs) were resolved through dialogue. Authors had the ability to change their entry after group discussion. Domain scores were calculated with the following formula:
Results
Search results yielded a total of 13 guidelines, eight of which were produced by multi-national organizations (Figure 1, Table 1) (11-23). Overall, the guidelines scored poorly: the median overall score was just 43%, with the highest overall score of 82% given to the British Society of Gastroenterology (BSG) guidelines, followed by 79% for the Asia-Pacific and European Association for the Study of the Liver (EASL) guidelines. Of the 13 guidelines, eight scored under 50%. Median scores were particularly low in the following domains: II: stakeholder involvement (39%); III: rigor of development (30%); and V: applicability (13%). Domain IV: clarity of presentation and domain I: scope & purpose were the highest scoring at 76% and 65%, respectively (Figure 2).
Domain I: scope & purpose
In this domain, questions pertain to the aims and objectives of the guidelines and the target users and population. Generally, the majority of the guidelines performed well, and the median score was 65%. The BSG scored the highest (81%), whilst the SEOM (Spanish Society of Medical Oncology) scored the lowest at 15% (Figure 3).
Domain II: stakeholder involvement
An integral part of the AGREE II scoring checklist is the involvement of the relevant stakeholders in the guideline production process. Overall, the median score was 39% with the Asia-Pacific, Chinese, and BSG guidelines yielded the highest scores, at 65%, 63%, and 61%, respectively. The SEOM guidelines scored the lowest at 8% (Figure 3).
Domain III: rigor of development
This was one of the lowest scoring domains. The overall score across all 13 guidelines was 30%, with the Asia-Pacific guidelines scoring the highest (81%) as they clearly laid out the methodology of the development of the guidelines from the evidence found in the literature (Figure 4). The Hilar guidelines had the lowest score at only 12%.
Domain IV: clarity of presentation
Domain IV was the highest scoring domain with a median score of 76%. Among all of the guidelines, the Asia-Pacific and Japanese guidelines both scored 90%. The Hilar consensus statement received the lowest score of 44% (Figure 4).
Domain V: applicability
Scores in this domain were the lowest of all, with the median score at just 13%. All of the guidelines scored less than 50%, with the highest scoring guideline being the Asia-Pacific guideline at 47% (Figure 5). The two lowest scoring guidelines scored a mere 1% (hilar and ICC-intrahepatic CC guidelines).
Domain VI: editorial independence
Scores in domain VI were generally reasonable at 56%. The highest score (92%) was achieved by the European Society of Medical Oncology (ESMO) (Figure 5). One guideline scored 0% (guidelines for palliative surgery of CC).
Recommendation for use
None of the 13 guidelines was recommended universally for use without modification. The Asia-Pacific and EASL guidelines scored a definitive ‘Yes’ by three appraisers and ‘Yes, with modification’ by the fourth appraiser. Five of the 13 guidelines (Hilar, ICC, Japanese, Palliative, and SEOM) scored a unanimous ‘No’ by the appraisers (Figure 6).
Score discrepancies
Discrepancies in scoring across appraisers were low overall; therefore, no further rounds of re-scoring or discussion were required to resolve issues.
Discussion
The present study is the first appraisal of the current CC guidelines using the validated AGREE II instrument. Overall, the quality of the guidelines as assessed by the AGREE II evaluation checklist was mediocre at best, with a median total score of only 43%. The BSG (82%), EASL (79%), and Asia-Pacific (79%) guidelines scored the highest overall score. Generally, the guidelines scored poorly in the domains of applicability, rigour of development, and stakeholder involvement, at 13%, 30%, and 39%, respectively. The highest scores were observed in the domains of clarity of presentation and scope and purpose, at 76% and 65%, respectively.
These findings highlight that careful attention and further developmental work on existing guidelines is required, particularly in the areas of clinical implementation and the involvement of patients/advocacy groups. Another area in which the guidelines are performing particularly poorly is their rigour of development. Often, the guidelines did not stipulate how they arrived at their recommendations. This is likely surprising given both the importance of transparency in guideline development and existence of validated systems for evaluating the scientific literature. One such system is the GRADE system, which has been specifically developed for the evaluation of evidence and classification of recommendations for guideline development (24). Nonetheless, only seven of 13 guidelines used the GRADE system to evaluate the quality of evidence and categorise the strength of recommendations, namely the Brazilian, NCCN, EASL, BSG, and Japanese guidelines (12,15,17,20,22).
Two others, the ESMO and the Asia-Pacific guidelines used alternative systems. The ESMO guidelines used the US Public Health Service grading system, and the Asia-Pacific guidelines categorised the evidence and classified the recommendations using a voting system based on a modification of the Canadian Task Force on the Periodic Health Examination (11,19). Four of the guidelines did not use any system at all (13,14,17,18). Consequently, the median score for rigour of development was only 38%. Guidelines that used a system to evaluate the evidence (i.e., the GRADE system) naturally scored higher, whilst those that did not employ any discernible system, such as the Hilar guidelines, scored poorly (12%).
Another area in which guidelines appear to perform universally poorly is stakeholder involvement, as has been extensively documented in the literature. The median score for this domain was just 39%, with the SEOM guidelines scoring only 8%. In the age of patient-centred care, patient autonomy, and informed consent, this lack of engagement with patients, patient advocacy groups, and the general public is concerning. In many instances, even other professionals involved in the treatment of patients with the condition were not consulted and the entire guideline was written by members of a single specialty. Although such guidelines can produce specialised recommendations from one aspect of care, i.e., surgical or oncological, they may miss other aspects not immediately within the purview of their specialty, which contradicts the ethos of holistic and multi-disciplinary care. Furthermore, very little guidance is provided by the guidelines. Of the 13 presented in this study, only two (Brazilian and ESMO) gave any recommendations for the follow-up and long-term management of these patients.
Another critical element of guidelines for the management of biliary tree pathology lies in the difference between benign and malignant disease and how they should be managed. For example, immunoglobulin G4 cholangiopathy is a multisystem inflammatory disorder that may present with intrahepatic biliary strictures in 51% of cases and proximal extrahepatic ducts in 49%; this disease should always be included in the differential diagnosis of biliary strictures (25). The BSG guidelines alone stressed the importance of differentiating between benign and malignant strictures. Such oversights highlight the limitations of the current guidelines in their scope and depth. In addition, the Japanese guidelines strongly recommend biopsy or cytology before surgery in order to differentiate malignant from benign strictures. Nakayama et al. reported that 10% of suspected and operable CCs were benign strictures (26). The remaining guidelines do not give recommendations regarding preoperative biopsy.
Moreover, staging laparoscopy is a useful tool to avoid unnecessary operations, though only the Asia-Pacific guidelines recommend the use of staging laparoscopy. In terms of ICC, the NCCN guidelines recommend colonoscopy and gastroscopy to rule out metastases from an asymptomatic gastrointestinal tumour. None of the other guidelines recommend it.
Most of the guidelines used the 7th AJCC staging system. It is reported that the main limitations of the AJCC are the definition of resectability and prediction of survival (27). The Blumgart staging system, which is based on the extent of biliary duct involvement by the tumour, the presence or absence of portal involvement, and the presence or absence of lobar atrophy, can define resectability and predict survival more accurately than the AJCC and Bismuth-Corlette staging systems (Table 2) (27). However, none of the guidelines proposed the above staging system.
The guidelines define the following as risk factors for CC: primary sclerosing cholangitis, parasitic infestations, hepatolithiasis, choledochal cysts, pancreatobiliary maljunction (PBM), toxins, and hepatitis B (HBV) and C (HCV) infections. However, only the BSG guidelines recommend the surveillance of patients with primary sclerosing cholangitis. In cases of PBM and choledochal cysts, the Japanese guidelines recommend cholecystectomy and extrahepatic common bile duct excision to prevent cancer development.
ICC and hepatocellular carcinoma (HCC) share many risk factors such as HBV, HCV, and cirrhosis. Guidelines on HCC recommend surveillance every six months with tumour markers and imaging modalities for high-risk patients (28). However, no such recommendation for surveillance of CC exists in any of the current guidelines.
Currently, there is no solid evidence to support standard lymph node dissection in patients with CC (29). However, NCCN guidelines suggest lymph node dissection to achieve better prognosis.
Expert consensus on ICC recommends that lymph node dissection be a standard part of the surgical management of patients with ICC.
It is reported that ≥7 lymph nodes are sufficient for the prognostic staging of hilar CC (30).
Another area in which guidelines scored particularly poorly was advice and guidance on how to implement the recommendations. This is an issue that has plagued many a guideline across a variety of medical sub-specialties, as well documented in the literature (28). Unfortunately, the CC guidelines are no exception. Very few guidelines documented the human and material resources required for the implementation of the recommendations or gave clear instructions as to how the recommendations could be put into action, leaving the readership with no real sense of direction or where to start.
There are even further shortcomings in the current CC guidelines. Perhaps most striking is the quality of the evidence on which many of the guidelines are based. Unfortunately, there is currently a distinct lack of randomised control trial data for many of the recommendations in place. A large proportion of the key points in many of the guidelines are based on observational data of potentially questionable reliability. If guidelines are to improve further, bold and rigorous studies within the boundaries of ethical consideration are required to further our understanding of the management of CC.
Our study has several limitations, some of which can be attributed to the very nature of the AGREE system. For example, it is debatable whether every domain in the AGREE system should carry the same weight in terms of scores as some of the others. The checklist has been criticised for its assumption that all domains are equally important for determining high-quality guidelines. Other criticisms of the checklist are that although assessors fill in their respective scores independently of each other, there is still the possibility of bias (positive or negative) and a certain level of subjectivity in the scoring. For example, certain guidelines from reputable international bodies may hold a favourable stance in the assessor’s mind before the assessor even begins to undertake the scoring. Conversely, a guideline from a less well established body may not score as highly due to a lack of ‘prestige’. A further limitation of our study is that we only selected guidelines in the English language for reasons of practicality; as such, we may have missed potentially high-quality guidelines presented in other languages.
Of note, The Asia-Pacific guidelines performs particularly well because their guideline is well set out, well written, thorough and easy to read. The evidence they base their recommendation on is no different to that found in many of the other guidelines, however what the Asia-Pacific guideline does well that others do not is the following:
- They clearly explain in detail the quality of the evidence they base their recommendations on, the degree of consensus and their methodology as to how they came about making the recommendations.
- They also rate their own recommendations in terms of quality and how strongly they recommend a particular practice.
- The guideline is well presented, thorough in terms of addressing all the relevant aspects of investigation, treatment and outcome.
As a result of the above, the Asia-Pacific consistently scores well throughout most of the Domains.
Recently, Idrees et al. (31) evaluated the impact of the centralisation of care and compliance with NCCN guidelines for resected CCs on long-term survival. It was reported that over time, in the USA, compliance with NCCN guidelines increased. In particular, for the period of 2004–2007, the compliance was 30%, which increased to 46% in 2011–2013. Of note, five-year overall survival was 45% in the patients who received NCCN-compliant surgical management, as compared to 40% in those who did not receive surgical care according to the guidelines. Interestingly, it was reported that the centralisation of care contributed only 8% of the improvement in survival, while compliance with guidelines improved survival by 17% (31).
Conclusions
The quality of the current guidelines for CC is generally poor or based on relatively low-quality evidence. It is imperative that future updated guidelines rely on high-quality trial data and take a multi-disciplinary approach by including patients and advocacy groups in the formulation of recommendations. Furthermore, a clear plan as to how to put the recommendations into practice (in both resource-rich and poor regions of the world) is desperately needed.
Acknowledgments
Funding: None.
Footnote
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://hbsn.amegroups.org/article/view/10.21037/hbsn.2019.09.06/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the noncommercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-ncnd/4.0/.
References
- Bismuth Henri. Hepatobiliary malignancy. Edward Arnold. 1994.
- de Groen PC, Gores GJ, LaRusso NF, et al. Biliary tract cancers. N Engl J Med 1999;341:1368-78. [Crossref] [PubMed]
- Khan SA, Emadossadaty S, Ladep NG, et al. Rising trends in cholangiocarcinoma: Is the ICD classification system misleading us? J Hepatol 2012;56:848-54. [Crossref] [PubMed]
- Welzel TM, McGlynn KA, Hsing AW, et al. Impact of classification of hilar cholangiocarcinomas (Klatskin tumors) on the incidence of intra- and extrahepatic cholangiocarcinoma in the United States. J Natl Cancer Inst 2006;98:873-75. [Crossref] [PubMed]
- Graham R, Mancher M, Miller Wolman D, et al. Clinical practice guidelines we can trust. Institute of Medicine (US) Committee on Standards for developing trustworthy clinical practice guidelines;Washington (DC): The National Academies press (US); 2011.
- Appraisal of Guidelines for Research and Evaluation II 2013. Available online: http//www.agreetrust.org
- Brouwers MC, Kho ME, Browman GP, et al. AGREE II: advancing guideline development, reporting, and evaluation in health care. Prev Med 2010;51:421-4. [Crossref] [PubMed]
- Coroneos CJ, Voineskos SH, Cornacchi SD, et al. Users’ guide to the surgical literature: how to evaluate clinical practice guidelines. J Can Chir 2014;57:280-6. [PubMed]
- Alonso-Coello P, Irfan A, Sola I, et al. The quality of clinical practice guidelines over the last two decades: a systematic review of guideline appraisal studies. Qual Saf Health Care 2010;19:e58. [PubMed]
- Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 2009;151:264-9. [Crossref] [PubMed]
- Valle JW, Borbath I, Khan SA, et al. Biliary cancer. ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol 2016;27:v28-37. [Crossref] [PubMed]
- Riechelmann R, Coutinho A, Weschenfelder G, et al. Guideline for the management of bile duct cancers by the Brazilian gastrointestinal tumor group. Arq Gastroenterol 2016;53:5-9. [Crossref] [PubMed]
- Mansour JC, Aloia TA, Crane CH, et al. Hilar adenocarcinoma: Expert consensus statement. HPB 2015;17:691-9. [Crossref] [PubMed]
- Weber SM, Ribero D, O’ Reilly EM, et al. Intrahepatic Cholangiocarcinoma: expert consensus statement. HPB 2015;17:669-80. [Crossref] [PubMed]
- Benson AB 3rd, Abrams TA, Ben-Josef E, et al. NCCN clinical practice guidelines in oncology. Hepatobiliary Cancers. J Natl Compr Canc Netw 2009;7:350-91. [Crossref] [PubMed]
- Benavides M, Anton A, Gallego J, et al. Biliary tract cancers: SEOM clinical guideline. Clin Transl Oncol 2015;17:982-7. [Crossref] [PubMed]
- Bridgewater J, Galle PR, Khan SA, et al. Guidelines for the diagnosis and management of intrahepatic cholangiocarcinoma. J Hepatol 2014;60:1268-89. [Crossref] [PubMed]
- Cai JQ, Cai SW, Cong WM, et al. Diagnosis and treatment of cholangiocarcinoma: A consensus from surgical specialists of China. J Huazhong Univ Sci technol 2014;34:469-75.
- Rerknimitr R, Angsuwatcharakon P, Ratanachu-ek T, et al. Asian-Pacific consensus recommendations for endoscopic and interventional management of hilar cholangiocarcinoma. J Gastroenterol Hepatol 2013;28:593-607. [Crossref] [PubMed]
- Khan SA, Davidson BR, Golden RD, et al. Guidelines for the diagnosis and treatment of cholangiocarcinoma: An update. Gut 2012;61:1657-69. [Crossref] [PubMed]
- Alvaro D, Cannizaro R, Labianca R, et al. Cholangiocarcinoma: A position paper for the Italian society of gastroenterology (SIGE), the Italian Association of Hospiatal Gastroenterology (AIGO), the Italian Association of Medical Oncology (AIOM) and the Italian Association of Oncological Radiotherapy (AIRO). Dig Liver Dis 2010.831-8. [Crossref] [PubMed]
- Kondo S, Takada T, Miyazaki M, et al. Guidelines for the management of biliary tract and ampullary carcinoma. Surgical treatment. J Hepatobiliary Pancreat Surg 2008;15:41-54. [Crossref] [PubMed]
- Witzigmann H, Lang H, Lauer H. Guidelines for palliative surgery of cholangiocarcinoma. HPB 2008;10:154-60. [Crossref] [PubMed]
- Norris SL, Bero L. GRADE Methods for Guideline Development: Time to Evolve? Ann Intern Med 2016;165:810-1. [Crossref] [PubMed]
- Ghazale A, Chari ST, Zhang L, et al. Immunoglobulin G4-associated cholangitis: clinical profile and response to therapy. Gastroenterology 2008;134:706-15. [Crossref] [PubMed]
- Nakayama A, Imamura H, Shimada R, et al. Proximal bile duct stricture disguised as malignant neoplasm. Surgery 1999;125:514-21. [Crossref] [PubMed]
- Jarnagin WR, Fong Y, DeMatteo RP, et al. Staging, resectability, and outcome in 225 patients with hilar cholangiocarcinoma. Ann Surg 2001;234:507-17; discussion 517-9. [Crossref] [PubMed]
- Gavriilidis P, Roberts KJ, Askari A, et al. Evaluation of the current guidelines for resection of hepatocellular carcinoma using the appraisal of guidelines for research and evaluation II instrument. J Hepatol 2017;67:991-8. [Crossref] [PubMed]
- Amini N, Ejaz A, Spolverato G, et al. Management of the lymph nodes during resection of hepatocellular carcinoma and intrahepatic cholangiocarcinoma: A systematic review. J Gastrointest Surg 2014;18:2136-48. [Crossref] [PubMed]
- Kambakamba P, Linecker M, Slankamenac K, et al. Lymph node dissection in resectable perihilar cholangiocarcinoma: A systematic review. Am J Surg 2015;210:694-701. [Crossref] [PubMed]
- Idrees JJ, Merath K, Gani F, et al. Trends in centralization of surgical care and compliance with National Cancer Center Network guidelines for resected cholangiocarcinoma. HPB 2019;21:981-9. [Crossref] [PubMed]