A systematic review of patient reported outcome measures (PROMs) and quality of life reporting in patients undergoing laparoscopic cholecystectomy
Cholecystectomy is the only definitive treatment for patients with symptomatic gallstones, with laparoscopic cholecystectomy being the current gold standard (1-3). In the UK alone over 60,000 cholecystectomies are performed annually, equivalent to approximately 100 procedures per 100,000 population (1), and more than 200 per 100,000 population in parts of Europe, and North America (1,4,5). Despite the therapeutic benefits of surgery and the potential economic savings in preventing further morbidity from gallstone disease, laparoscopic cholecystectomy is not without risks (2).
Patient reported outcomes (PROs) provide a means of measuring various outcomes such as clinical symptoms, patient satisfaction and health-related quality of life (HRQoL) from a patient’s perspective subjectively (6,7). Validated questionnaires or patient reported outcome measures (PROMs) are often used to collect PRO data (8). In the National Health Service (NHS) in England, this process has been adopted as mandatory practice for measuring HRQoL in hip and knee replacement surgery, groin hernia repair and varicose vein surgery since April 2009 (9). In addition to comparing the quality of services across healthcare providers, the collection of PROs can also assist patients and clinicians in clinical decision making; by monitoring illness, and the effectiveness of treatment (9-11).
The primary aim of this systematic review was to identify and critically appraise all historic studies evaluating patient reported HRQoL, in adult patients, undergoing laparoscopic cholecystectomy for symptomatic gallstones.
The secondary aim was to perform a quality assessment of cholecystectomy-specific PROM-validation studies using the Consensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist (12).
Search strategy
A search for all relevant literature was performed using PubMed, Google™ Scholar, the Cochrane Library, and MEDLINE (Ovid) databases in April 2016, and updated in September 2017 to include CINAHL (EBSCO), EMBASE (Ovid), and PsychINFO (Ovid). The following search criteria modified from those developed by the Oxford PROM Group in 2010 (13), were used to identify relevant studies: (cholelithiasis.mp. or cholecystitis.mp. or cholecystolithiasis.mp. or gallstone*.mp. or gall stone*.mp. or gallbladder*.mp. or gall bladder*.mp. or biliary colic.mp. or biliary sludge.mp. or cholecystectomy.mp.) and ((HR-PRO or HRPRO or HRQL or HRQoL or QL or QoL or PROM or PRO).ti,ab. or quality of life.mp. or (health index* or health indices or health profile*).ti,ab. or health status.mp. or ((patient or self or child or parent or carer or proxy) adj (appraisal* or appraised or report or reported or reporting or rated or rating* or based or assessed or assessment*)).ti,ab. or ((disability or function or functional or functions or subjective or utility or utilities or wellbeing or well being) adj2 (index or indices or instrument or instruments or measure or measures or questionnaire* or profile or profiles or scale or scales or score or scores or status or survey or surveys)).ti,ab.).
The search was performed without date restrictions but was limited to full-text articles. Due to the availability of resources the search was also limited to articles available in the English language, or English translation. Only studies with an adult population over 18 years of age were included. The bibliographies of studies included were also reviewed.
Study selection
Studies identified through the search strategy were assessed for inclusion, initially by title and abstract, and subsequently through full text review (P Daliya and EH Gemmill). Studies were only included where the outcome measure of HRQoL formed either a primary or secondary study aim. Only studies reporting on patients undergoing cholecystectomy for symptomatic gallstones, in which at least one study population underwent a conventional 4-port laparoscopic cholecystectomy (CLC) were included. Studies which reported ‘exclusively’ on patients with biliary malignancy, and the complications of gallstone disease, such as gallbladder necrosis, perforation, pancreatitis, and choledocholithiasis were excluded due to the potential variability of patient populations and management of these groups.
Validation studies involving either the development or assessment of cholecystectomy-specific PROMs were also included but analysed separately. Review articles such as meta-analyses and systematic reviews were excluded, as were case reports, editorial comments and letters. Duplicate studies and populations were cross-referenced and removed. Figure 1 demonstrates the preferred reporting items for systematic reviews and meta-analysis (PRISMA) flow diagramJeny (14).

Data extraction
Two independent reviewers (P Daliya and EH Gemmill) extracted data from the included studies, with discrepancies resolved by a third and fourth (DN Lobo and SL Parsons). Data were collected on the details of the publication (author names, year of publication, level of evidence and study type, number of centres involved, and country), patient characteristics within each study (sample size, diagnoses, mean age, and gender), and PROM-specific details (PRO instruments used, PRO concepts and scoring methodology, and survey distribution, response, and follow-up). An assessment of bias was performed on all non-validation studies utilising the revised Cochrane risk-of-bias tool for randomised trials (RoB 2.0) (15), and the Risk Of Bias In Non-randomised Studies – of Interventions (ROBINS-I) assessment tool (16), as appropriate.
Quality assessment of cholecystectomy-specific validation studies
The assessment of the quality of PROM-validation studies was performed using the COSMIN checklist; a critical appraisal tool which was devised as part of a Delphi study to help evaluate the methodological quality of studies on PROs (12,17). The checklist uses a standardised descriptive framework to assess each of 9 measurement properties (internal consistency, reliability, measurement error, content validity, structural validity, hypothesis testing, cross-cultural validity, criterion validity, responsiveness) against quality markers. Each measurement property, where relevant, was assessed by completing between 1–18 items on the checklist. A 4-point scoring system (“poor”, “fair”, “good”, and “excellent”) specifically designed by COSMIN for systematic reviews of measurement properties was used to grade each item (12). An overall score for each measurement property was summarised on a “worst score counts” basis, i.e., where a score of “good” or “excellent” was deemed as evidence in support of adequate methodological quality for that study and “poor” or “fair” as inadequate methodological quality (12).
Registration of review
The study proposal was registered (Reg. No. CRD42016048211) with the PROSPERO database (www.crd.york.ac.uk/prospero). This was amended subsequently to include the additional databases used, and the specifications required to complete a COSMIN analysis on PROM-validation studies.
A total of 10,615 articles were identified and screened by title and abstract review. Of these, 148 remaining articles underwent full text review for eligibility. Details on the use of a PRO questionnaire were frequently found to be lacking, or specifics on the study population such as diagnoses, or intervention were also not defined in some cases. Further details on study exclusion are as described in the PRISMA flow diagram (Figure 1) (14).
A total of 57 studies were identified as utilising PROMs in patients undergoing laparoscopic cholecystectomy, of which 6 of these were identified as validation studies researching the psychometric properties for PROMs in patients undergoing laparoscopic cholecystectomy (18-23).
Study quality
Of the 51 non-validation (24-35) studies (36-50) identified (51-74), the majority were performed in the last decade (62.7%), in Europe (60.8%), and as single centre studies (60.8%). Almost 20% provided level 1 evidence through randomised controlled trials (RCTs), but the majority were conducted as either prospective cohort or case control studies. All included trials specified the inclusion of patients with symptomatic gallstone disease, although further analyses identified significant heterogeneity in this definition which also included choledocholithiasis, pancreatitis, biliary dyskinesia, and incidental biliary tumours (Table 1).

There was significant variation in the selection of PROMs as reflected by the differing study outcomes, however the 36-item Short Form survey (SF-36) generic measure, European Quality of Life Five Dimensions Questionnaire (EQ5D) utility measure, gastrointestinal quality of life index (GIQLI) disease-specific measure, and visual analogue pain scores (VAPS), featured the most frequently. Study samples in 18 studies (35.3%) were found to be ≤100, although these ranged from between 31 to 100 patients, with 5 studies describing a population of <60 patients (25,35,49,51,58).
Risk of bias assessment
A risk of bias assessment demonstrated very few studies with a consistently low risk of bias across all domains. Although randomisation was performed well to minimise selection bias in the majority of studies, blinding was performed quite poorly overall. Where a number of studies employed special dressings to blind patient participants against intervention identification, some comparative outcomes were unable to be realistically blinded against due to the specific outcomes studied (27,28,33,42,44,45,51,71). These included the comparison of inpatient and outpatient cholecystectomy (28,34,42,44), and the measure of cosmesis (33,39,45,51). Three studies were underpowered, having failed to recruit sufficient participants (27,30,48), and 12 presented incomplete data having either excluded surveys with missing responses or discounted those lost to follow-up (30,31,34,36,40,46,47,50,53,57,65,68) (Tables 2,3).

The majority of studies (94.1%) included PROs as a primary outcome measure, and over 60% of studies measured more than two PRO concepts. These included HRQoL, cosmesis and body image, post-operative pain, sexual function and patient satisfaction (Table 4). Forty studies (78.4%) were set up with the intent to compare two or more different operative techniques for cholecystectomy. All studies were performed for research purposes with no involvement of patient groups to aid PROM selection. Profile scores rather than single indicator or index numbers were used to describe PROs in the majority of studies, with only 25.5% of studies using both generic and disease-specific PROMs, and only 60.8% of studies used PROMs which were validated with demonstrable evidence of this. A significant proportion of studies (88.2%) did not discuss the management of missing responses within surveys, 21.6% did not consider baseline or pre-operative PROM scores for their population, and 33.3% were not clear about their survey return rate. Full study characteristics are available in the Table S1.

Validation studies
Of the 6 PROM-validation studies identified, 4 reported on the gastrointestinal quality of life index (GIQLI) (18-21), whereas one study reported on the Otago Gallstones Condition Specific Questionnaire (CSQ) (22), and one on the Gallstone Impact Checklist (GIC) (23). These studies included original validation studies (18,22,23), in addition to translations in to other languages (19-21).
COSMIN analysis
The commonest measurement properties analysed were internal consistency and reliability (all 6 PROM-validation studies), and responsiveness (5 of 6 studies). Only 2 studies scored either “good” or “excellent” for internal consistency, describing adequate methodological quality (20,21), whereas the other 4 studies rated as either “fair” or “poor”, describing inadequate methodological quality. The summary scores for each measurement property, for each study are shown in Table 5. No studies performed an assessment of “measurement error”, “hypotheses testing”, or “criterion validity”. The methodological qualities assessed for each study are summarised in Table S2.

Due to the limited number of PROM-validation studies identified, the quality of the measurement instruments identified was not assessed against the “criteria for good measurement properties” as recommended by the COSMIN guidelines (76) and, therefore, preclude recommendation of a specific PROM for use in laparoscopic cholecystectomy
PROM selection
A recent systematic review of RCTs evaluating PROs after cholecystectomy (77) utilized the International Society of Quality of Life Research (ISOQOL) checklist to assess the quality of reporting in their evaluated studies. The authors demonstrated that, despite the availability of the ISOQOL checklist since 2013, the majority of studies did not adhere to guidelines, and demonstrated high bias and poor quality reporting of PROs (77).
In contrast, we analyzed all clinical trials evaluating HRQoL after laparoscopic cholecystectomy so as not to exclude the majority of clinical studies (>80%) which were non-RCTs. We therefore used the amended checklist as described by Patrick and Erikson (75) in the Cochrane Handbook, to describe and assess the identified studies. Much like Mueck et al. (77) the present review also demonstrated significant variability in PRO reporting. Across the clinical trials included, a wide variety of concepts were evaluated in addition to HRQoL, via a number of different PRO instruments (Table 1). This variation reflects the lack of specific recommendations in PROM selection in patients undergoing laparoscopic cholecystectomy, and the variation in study rationale which in itself can impact PROM selection.
Each study seemingly selected PRO instruments based on the relevance to primary or secondary outcomes. However, despite the availability of guidance documentation on the use of PROs in clinical trials (11,75), only 25% of the studies reviewed measured both generic and condition-specific PROs. Justification on the rationale for selection was also varied, with documentation in only 16 papers. These reasons included the following: due to the availability of a standardized comparative reference population (30,56,59,73), pre-existing validation within the same or similar cohort (32,40,47,54,56,58,65,71), easier survey application or user friendliness (26,34), adherence to recommended guidelines (although these were not specified) (43), to aid the calculation of specific outcomes which are dependent on a specific type of PROM, i.e., quality-adjusted life year (QALY) (61), or prior knowledge of the psychometric quality of the chosen instrument (28).
PROM dissemination
To our knowledge the completion of PRO surveys in all specified studies was either by the patients themselves or with the aid of a dedicated researcher. These were administered to patients in person, by post, or over the phone via traditional paper surveys. None of the studies included described the use of digital, electronic, or automated PRO mediums, despite recent advances in technology. Many alternative modalities are now available including web-based patient surveys, tablet-based applications, or voice activated phone surveys to name a few (78).
Whilst these more modern methods have the potential to increase the efficiency of data collection, reduce transcription errors caused by data entry, aid data analysis, and reduce missing data points within surveys, they also have some significant limitations. Licensing costs for validated surveys can be significant given the importance of data security, and users must be technologically adept or receive appropriate training (78).
All studies collected PRO data prospectively for research purposes although 4 (40,56,61,62) of the 8 Swedish studies also utilized their national registry [GallRiks (79)] to aid data collection as standard practice, which was also performed prospectively. This is significant given that retrospective data collection is more likely to add bias due to poor recollection and a potential increase in data gaps (78).
PRO analysis
In trials where both profile and index score calculations were possible, there was no explanation given when only one measurement strategy was used. Although profile scores can provide useful information on multiple PRO domains such as physical (pain, mobility, activity) and psychological (mood, energy, anxiety or depression) functioning, they are not always possible nor do they provide additional benefit when compared with index or indicator scores in some studies (75). These overall scores can provide sufficient information to demonstrate a change in HRQoL and can be particularly useful when PROs are used as markers for other outcomes such as cost effectiveness information or QALY to assess service quality (75).
Although the majority of studies commented on the return rate of surveys, which demonstrated good overall patient participation and low attrition, very few studies commented on the management of incomplete returned HRQoL surveys. This is significant as the imputation of results into missing data points or conversely the extraction of incomplete surveys can introduce bias (8,78). Similarly the lack of pre-operative or baseline population values prevent a calculation in change from baseline; a concept useful in demonstrating unbiased improvements or deterioration from the population norm (80).
Review of methodological quality
The 2011 review by the Oxford PROM Group appraised 7 PRO instruments on methodological quality, and performed an expert panel review on their suitability as clinical care evaluation tools (11). This review recommended one of two generic health measures (SF36), one of one preference-based measures (EQ5D), and one of four condition-specific measures (CSQ) as PRO assessment tools in patients undergoing cholecystectomy. Interestingly, these recommended PROs were noted to have little or no evidence of good methodological quality on assessment of their psychometric properties (reproducibility, internal consistency, content validity, construct validity, responsiveness, interpretability, and the presence of floor to ceiling effects). Similarly, our COSMIN review of the 6 PROM-validation studies identified, also demonstrated fair to poor methodological quality in the majority of the psychometric properties evaluated (internal consistency, reliability, content validity, cross-cultural validity, and responsiveness). Unfortunately, the identification of such few validation studies of poor quality obviated the possibility of commenting on the quality of the identified PRO instruments. This has a significant bearing on recommending PRO instruments as guidelines suggest that studies of poor quality provide little value (12).
Psychometric properties in detail
Internal consistency was analysed in all 6 PROM-validation studies. Despite all studies achieving a Cronbach’s alpha ≥0.7 for their global rating score, COSMIN analysis demonstrated poor internal consistency. This was accounted for by studies failing to describe their management of missing data points (18), and the use of inadequate sample sizes (19,22,23). Further inspection also demonstrated that where global rating scales achieved an optimum acceptable value ≥0.7 for Cronbach’s alpha (7,11,76), a measure of scale reliability, individual dimension scores were found to be <0.7 in some instances (19,23) demonstrating poor inter-correlation within scales.
Studies deemed to have poor reliability (a measure of scale stability) had small sub-group sample sizes (18), and inadequate intervals (18) [such as 48 hours, instead of the recommended minimum of 2 weeks (81)]. The performance of significant interventions between test-retest readings [surgical management (20,23)], a change in environment [ward-based to clinic room (22)], or a move from researcher-led surveys, to postal surveys (22) also affected the measure of scale reliability (76).
The three studies (18,22,23) which measured content validity performed well overall, only missing out on a score of excellent due to minor methodological flaws in study design, such as non-reporting of missing data, or a lack of sub-group demographics to detail the constitution of the expert review panel.
Three studies involved a translation of the GIQLI questionnaire (19-21). Two were translated from the original German GIQLI to either Spanish (20) or Swedish (21), and one was translated from the original English GIQLI to Mandarin Chinese (19). Two studies commented on translation alone and did not meet the full criteria for cross-cultural validity (19,21). Information on the expertise of translators was limited to language expertise alone in all studies. There was no description on the expertise of translators with respect to the disease process studied, or the construct measured. No mention was made of whether translators worked independently, and all studies performed the minimum requisite of one forward and backward translation, using a minimum or 2 translators. The translation studies did not describe any pre-test process (19,21), although in the one study that analysed full cross-cultural validity, minimal information was provided of the study sub-group, reducing the overall methodological score to “poor” (20).
All studies evaluated scored poor for responsiveness due to the absence of detail on study hypotheses (18-21,23). This was because none of the studies had commented on or quantified the expected direction or magnitude of study outcomes a-priori (76).
Strengths and limitations
Although other assessments of methodological quality are available, the COSMIN analysis is to our knowledge the most standardized method of assessment of PROM-validation studies given the stringent criteria and associated guidelines.
Due to our limitation of resources the exclusion of studies performed in languages other than English may have prevented the identification of some PRO and PROM-validation studies.
This review of PRO studies assessing HRQoL and PROM-validation studies in patients undergoing laparoscopic cholecystectomy identifies a lack in consistency of study design and PRO reporting in clinical trials. Whilst an increasing number of studies are being performed to evaluate PROs, a lack of adherence to existing PRO administration and reporting guidelines is continuing to negatively affect study quality. We recommend future clinical trials utilizing PROs should adhere to established comprehensive guidelines as described in the CONSORT (Consolidated Standards of Reporting Trials) PRO extension (6), and the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) PRO extension (80). Researchers should aim to re-validate PRO instruments in their study population (75,80) and, therefore, ensure selected PROMs have good methodological quality (76).
Funding: This work was supported by the Medical Research Council [grant number MR/K00414X/1]; and Arthritis Research UK [grant number 19891]. Prita Daliya is a recipient of a Research Fellowship funded by the Royal College of Surgeons of England and EIDO Healthcare Limited.
Conflicts of Interest: SL Parsons is a company director for EIDO Healthcare Limited. The other authors have no conflicts of interest to declare.
