Original research••

Prospective external validation of the automated PIPRA multivariable prediction model for postoperative delirium on real-world data from a consecutive cohort of non-cardiac surgery inpatients

•,,,,,,,,,,.

...

Abstract

Objectives Postoperative delirium (POD) is a common complication in surgical patients over 60, increasing morbidity, mortality and hospital stays. While international guidelines recommend risk screening, resource constraints limit implementation. This study externally validated the Pre-Interventional Preventive Risk Assessment (PIPRA) algorithm, a CE-certified tool for identifying high-risk patients to enable targeted prevention.

Methods A prospective validation study was conducted at a 335-bed Swiss hospital as part of a quality improvement initiative. Data from 866 patients aged ≥60 undergoing non-cardiac, non-intracranial surgery (May–June 2023) were analysed. The PIPRA model’s performance was assessed on discrimination (Area Under the Receiver Operating Characteristic Curve (AUROC)) and calibration.

Results POD occurred in 11.5% (n=100) of patients. The PIPRA model showed good discrimination (AUROC=0.77, 95% CI: 0.72 to 0.82) and generally accurate calibration, though slightly overpredicting risk in high-risk patients. POD was associated with higher mortality, prolonged intensive care unit (ICU)/hospital stays and increased nursing care needs. The model effectively stratified patients for targeted interventions.

Discussion The PIPRA algorithm demonstrated robust performance in a real-world setting, affirming its utility for POD risk prediction. The study highlighted the model’s applicability across diverse clinical environments, despite differences in patient populations and screening protocols.

Conclusions The PIPRA algorithm is a reliable tool for identifying surgical patients at risk of POD, supporting early intervention strategies to improve patient outcomes. Its integration into clinical workflows may enhance POD prevention efforts and optimise resource allocation in perioperative care.

What is already known on this topic

Postoperative delirium (POD) is a prevalent and serious complication among older surgical patients, associated with increased morbidity, mortality and healthcare costs. While early risk identification is recommended, routine screening remains challenging due to resource constraints. The Pre-Interventional Preventive Risk Assessment (PIPRA) algorithm was developed to address this gap and validated on clinical trial data, but its real-world clinical performance required further validation.

What this study adds

This study confirms that the PIPRA algorithm effectively predicts POD risk in a real-world setting, demonstrating strong discrimination (AUROC=0.77) and generally accurate calibration. It also highlights the association between POD and worse clinical outcomes, reinforcing the need for targeted preventive strategies.

How this study might affect research, practice or policy

The findings support the integration of PIPRA into clinical workflows to enhance POD prevention. Its use could enable more efficient resource allocation, improve patient outcomes and inform future policy decisions on POD risk assessment in perioperative care.

Introduction

Postoperative delirium (POD) is the most common complication affecting surgical patients over the age of 60.1 POD is characterised by altered consciousness, agitation, confusion and cognitive decline, and long-term adverse outcomes including increased postoperative morbidity and mortality, functional and cognitive decline, prolonged length of stay, higher readmission rates and postoperative neurocognitive disorders.2–5

International guidelines and medical societies recommend screening for POD risk prior to surgery so that preventive measures can be introduced for patients at risk.6–11 Non-pharmacological multicomponent interventions12 13 and programmes such as the Hospital Elder Life Program14 have been shown to reduce delirium incidence;15–17 however, these measures are not routinely applied by all healthcare professionals, often due to limited awareness or resources.

We have developed a preoperative POD risk prediction algorithm called PIPRA (Pre-Interventional Preventive Risk Assessment),18 following a comprehensive individual patient data meta-analysis (IPDMA).19 PIPRA is an automated, CE-certified POD risk prediction tool designed to identify at-risk patients over 60 years of age, thereby enabling clinicians to implement targeted preventive strategies for those patients at highest risk.

This study describes the first external validation of PIPRA in a real-world clinical setting.

Methods

Source of data and participants

The delirium risk calculated using the PIPRA model was routinely collected, together with the delirium screening results, during a quality improvement project (QIP) in a Swiss, private 335-bed hospital.20 The project included all inpatients aged 60 and above undergoing non-cardiac, non-intracranial surgery with an admission date from 1 May to 30 June 2023. In 2023/2024, the hospital had a mix of 43.0% public and 57.0% private or partially privately insured patients, 79.4% of patients were outpatients and, of the inpatients, 82.4% were elective. The QIP aimed to increase delirium screening, treatment and prevention.

Comparison of validation to original development data

The development data originated from an IPDMA of clinical studies18 19 21 performed in several countries, whereas the validation data for this study consisted of real-world data collected as part of a QIP in a single private hospital in Switzerland. The eligibility criteria in this QIP were generally broader than those in the studies captured by the IPDMA. The delirium outcome was also measured heterogeneously in the IPDMA data, with only one study (contributing less than 10% of patients) that used the Delirium Observation Screening Scale (DOSS) as a diagnosis tool, although in combination with Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria. In the QIP presented here, the DOSS was used as the primary diagnostic tool, in combination with any routine diagnosis performed by the physician according to DSM-V criteria. In the development dataset, the predictors were collected specifically to investigate predictors of POD, while here they were collected during the routine preanaesthesia consultation.

Outcome

The outcome predicted by the model was POD, which was defined as delirium occurring up to 7 days after surgery. In this dataset, a patient was deemed delirious if they had a score of 3 points or more for the DOSS or had the ICD-10 diagnosis code for delirium for their hospital stay. The DOSS is a 13-point checklist for the identification of delirium.22 The data were collected in routine clinical practice, with delirium risk present in the electronic health record of the patient and viewable by the nurses performing the DOSS. The nurses were not informed about the validation project and were instructed to perform the DOSS on all patients as part of the QIP. Compliance with DOSS screening was assessed by comparing the observed number of DOSS screenings to the expected number. We anticipated one DOSS screening per patient per shift, with three shifts per day. Thus, compliance was calculated as a percentage:

$Display Formula$

Predictors

All PIPRA variables (age, body mass index, American Society of Anesthesiologists Physical Status Classification System (ASA) score, number of prescribed medications, cognitive impairment, history of delirium, surgery risk, laparotomy/thoracotomy, optional preoperative C reactive protein (CRP) value) except CRP were routinely collected. CRP is optional and therefore was only used where available.

Predictors were not collected specifically for the delirium risk. Instead, they were all part of clinical routine and were collected by the anaesthesiologist during the preanaesthesia consultation.

In addition to the main predictors, we recorded patient sex and the Self-Care Index (SPI) for further characterisation. The SPI, assessed daily, measures self-care abilities (eg, hygiene, mobility, elimination), producing a total score from 10 (maximum impairment) to 40 (full self-care).23

Sample size

A sample size was not calculated since it was a QIP. However, with a POD incidence of 10% and a targeted area under the curve (AUC) of 0.75, a 95% CI for the AUC would have a width of 0.12 with the 866 subjects enrolled in the study. We consider this to be precise enough to provide meaningful information about model performance.

Missing data

Missing data were only imputed for PIPRA predictors. Following the use proposed in the development paper, mean/mode imputation was used. When CRP was not available, the PIPRA submodel without CRP was used for prediction rather than imputing CRP, as in the development paper.

Statistical analysis methods

Subject and procedure characteristics were summarised by median (IQR) or frequency (percentage), dependent on data type. Differences in clinical parameters across groups (eg, POD/no POD) were explored using t-tests or χ² tests, accordingly. All analyses were performed using R Statistical Software (V.4.3.2).24

This external validation was performed using data completely independent of the model development. To compute the predicted risk on the new data, the following equation was applied: p=1/(1+exp(−lp)), where p is the predicted risk and lp is the linear combination of the individual predictor variables multiplied by the log odds coefficients (including the intercept).

The participant and procedure characteristics were summarised and compared across the development and validation studies using the same methods as for comparing across POD. They were also visualised using violin plots or bar charts.

Both model discrimination (AUC) and calibration (calibration-in-the-large, calibration slope, calibration plot) were assessed on the validation data.

Risk groups

Patients were stratified into low (PIPRA <10%), intermediate (PIPRA from 10% to 19.9%), high (PIPRA from 20% to 34.9%) and very high (PIPRA >35%) risk groups as per the original publication.18 Patients with high risk or very high risk have more chance of developing delirium than the average elderly population. For this population, it is essential that some preventive perioperative measures are taken. An exploratory analysis was added, in which a stratified table was created based on risk group. The number of delirious patients was identified, and the number of potentially preventable cases was estimated, assuming prevention effectiveness rates reported in the literature.17

Results

Participants

5279 patients were admitted during the timeframe of the validation study. Excluding patients under 60 years of age, outpatients and non-surgical or cardiac surgery patients, 866 patients remained (figure 1). The characteristics of the patients showed worse outcomes for delirium patients, including on average sixfold more time spent in the intensive care unit (ICU), twice the length of stay and threefold more nursing time (table 1). Delirium patients had, on average, a 10-fold higher mortality; however, only seven patients died in this cohort. Patients experiencing POD were generally older, with more cognitive impairment, a greater history of delirium, less able to take care of themselves (lower SPI), with higher number of prescribed medications, higher CRP values and higher ASA status (table 1).

Figure 1

Request permissions

Patient flow through the study. 5279 patients were admitted during the time frame of the QIP. Excluding patients under 60 years of age, outpatients and non-surgical or cardiac surgery patients, 866 patients remained. QIP, quality improvement project.

Table 1

•

Description of included subjects and outcomes

While the participants in the development study and this validation did not display clinically meaningful differences in age, BMI or history of delirium, they did differ in other ways (figure 2, online supplemental table S1). In the validation, fewer patients underwent higher risk or open procedures, and the patients themselves were more frail (higher ASA). All patients in the validation study were on at least one medication.

Figure 2

Request permissions

Comparison of predictor variable distributions between the development and validation datasets. BMI, body mass index; CRP, C reactive protein; ASA, American Society of Anesthesiologists Physical Status Classification.

The mean compliance to delirium screening was 60.6%. 79 patients (9%) received no screening, and these patients were younger, with lower ASA, smaller surgeries and lower length of stay (online supplemental table S2). Anecdotal feedback suggested the night shift had the lowest compliance, though we were not able to retrieve this level of data due to ethical considerations.

Overall, data were mostly missing for CRP (table 1). However, this was anticipated in the model design, and the submodel without CRP was used for these patients. Missingness was also relatively high for cognitive impairment, history of delirium and number of medications (table 1).

Model performance

The validation analysis included 866 eligible subjects observed from May 2023 to the end of the study. Of these, 100 subjects (11.5%, 95% CI 9.6% to 13.8%) were identified as having POD. According to the risk groups created at development, 59.2% of subjects were considered low risk, 21.6% intermediate risk, 10.4% high risk and 8.8% very high risk (figure 3a).

Figure 3

Request permissions

Results and performance of the PIPRA model. (A) The PIPRA model stratifies most patients as low risk. (B) The AUC of the PIPRA model is 0.77 (95% CI 0.72 to 0.82). (C) The model shows good calibration, with a slight overprediction for the higher risk patients. AUC, area under the curve; ROC, Receiver operating characteristic; PIPRA, Pre-Interventional Preventive Risk Assessment.

At external validation, the PIPRA model was found to discriminate well between those with and without POD (AUC=0.77 (95% CI 0.72 to 0.82)). There were no major violations to calibration, although there was a tendency to slight overprediction for higher risk patients (figure 3b,c). The diagnostic accuracy was dependent on the chosen threshold and is displayed in detail in table 2. For example, at the threshold of 10% or ‘medium’ risk, the sensitivity was 74% and specificity 64%.

Table 2

•

Diagnostic accuracy

Potential effect

An exploratory analysis was performed to estimate the preventative potential. This analysis suggested that three-quarters of delirium cases could be prevented, under the assumption of a constant prevention effect from the literature, and when focusing exclusively on patients classified as medium risk or higher (online supplemental table S3).

Discussion

The delirium incidence at 11.5% was slightly lower compared with the development dataset (19.7%) and that reported in the literature. However, since the average delirium risk at 13% is similar to the observed incidence, it suggests that the population differences are accounted for in the PIPRA model. These differences might have arisen as the real-world data were collected at a private hospital, while the development data originated from university and public hospitals.

Limitations

The QIP was not a clinical trial and, therefore, there were no sample size calculations, and there were strict limitations on exploring the data in depth due to ethical considerations.

While adherence to delirium screening protocols meets acceptable standards for clinical practice (91% of patients were screened), higher compliance is anticipated in research settings, with only 61% of the total expected screenings performed. The DOSS is simple and does not require direct patient interaction; it has limitations in both sensitivity and specificity.25 To mitigate this issue, we provided nurses with training to accurately identify and diagnose delirium, stressing that the DOSS is a mandatory supplementary tool. Additionally, we emphasised in the training that the tool is prone to missing hypoactive delirium.

The risk prediction model was originally developed on data collected by IPDMA. In the development data, delirium was predominantly assessed by CAM. Only one study used the DOSS; however, it was used in conjunction with DSM-IV criteria. Differences in setting, outcome measurements and patient characteristics between the development and validation data could have caused the model to underperform; however, the model performed well, and this further confirms that the risk prediction model is robust.

Since the validation was performed in normal clinical practice, there was no blinding of the nurses to the prediction. While they were not involved or informed of the validation project, it is still possible that this introduced bias.

Strengths

Despite the limitations discussed above, we see the model performing well on real-world data, showing the model to be robust. This was expected, since the original model was built on a diverse dataset of IPMDA data. To the best of our knowledge, this is the only POD risk prediction model that has been built on IPDMA data and so far the only POD risk prediction model that is CE-marked and approved for clinical use.

A key strength of using real-world data is the broader demographic representation for subpopulations often under-represented in trials. All patients were included in the study, and there was no bias introduced through a consenting process, where often those patients who are most vulnerable are lost (eg., those with low education or mild cognitive impairment—known risk factors for delirium).19 However, more than half the patients were privately insured, which is not representative of most hospitals.

Interpretation

The AUC from this external validation was excellent in comparison to other external validations for POD prediction. In the original study, a pilot external validation on 359 patients showed an AUC of 0.74, which is close to the 0.77 observed here. A project undertaken by Wong et al, using a dataset of 292 patients to test various delirium prediction models head-to-head revealed C-indices ranged from 0.52 to 0.74, where even the highest scoring algorithm was below the AUC of 0.77 observed here.26 Strikingly, in that review, the highest self-reported AUC from the assessed models was 0.94;27 however, this depreciated to 0.61 in the external validation. This illustrates the importance of external validation on new patient cohorts from different settings. It also compares favourably to a risk prediction model using only age, in both discrimination and calibration (online supplemental figure S1).

The use of real-world data has opposing strengths and limitations to the use of data from clinical trials. There was a preliminary external validation in the original study,18 and a clinical trial to validate the algorithm further is underway, which will complement this study well.

Implications

Identification of patients at risk of delirium is essential for effective, targeted prevention strategies and early treatment of delirium. The PIPRA tool was developed as a consensus tool, using data from many authors and a number of large studies through a collaborative IPDMA, and this real-world validation study confirms that we have a robust and reliable POD risk prediction algorithm.

Supplementary PDF

X: @btdodsworth
Contributors: Conceptualisation (NSG, FB, MZ, RS, PM and BTD), data acquisition and data collection (FB, MZ, MM and SPW), methodology and formal analysis (KAR, MV and BTD), project administration (PM and BTD), supervision and funding (RS and BTD), writing–original draft (KAR, NSG, MAK and BTD), writing–review, editing and approval of the final version (KAR, NSG, MV, FB, MZ, RS, MAK, PM, MM, SPW and BTD). BTD is the guarantor.
Funding: This study was funded by PIPRA AG, Stiftung Quality of Life Switzerland and EIT Health. EIT Health is supported by the EIT, a body of the European Union.
Competing interests: KAR is a shareholder of PIPRA AG. BTD and NSG are founders, shareholders and employees of PIPRA AG. MAK is a shareholder and employee of PIPRA AG. The remaining authors have no competing interests to declare.
Provenance and peer review: Not commissioned; externally peer reviewed.
Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

No data are available. Research data are not shared, but requests for collaboration are encouraged.

Ethics statements

Patient consent for publication:

Ethics approval:

The quality improvement project received a waiver Req-2023-00307 from the Zurich Cantonal Ethics committee as it uses anonymous data.

Acknowledgements

We would like to thank the nurses who performed the delirium screening and the delirium team for their efforts in implementing the delirium screening and supporting the staff. We would also like to thank Fabian Gautschi and Victoria Engler for supporting the data acquisition.

Inouye SK. Delirium in older persons. N Engl J Med 2006; 354:1157–65.
doi:10.1056/NEJMra052321•Google Scholar
Gleason LJ, Schmitt EM, Kosar CM, et al. Effect of Delirium and Other Major Complications on Outcomes After Elective Surgery in Older Adults. JAMA Surg 2015; 150:1134–40.
doi:10.1001/jamasurg.2015.2606•Google Scholar
Goldberg TE, Chen C, Wang Y, et al. Association of Delirium With Long-term Cognitive Decline: A Meta-analysis. JAMA Neurol 2020; 77:1373–81.
doi:10.1001/jamaneurol.2020.2273•Google Scholar
Kunicki ZJ, Ngo LH, Marcantonio ER, et al. Six-Year Cognitive Trajectory in Older Adults Following Major Surgery and Delirium. JAMA Intern Med 2023; 183:442–50.
doi:10.1001/jamainternmed.2023.0144•Google Scholar
Raats JW, van Eijsden WA, Crolla RMPH, et al. Risk Factors and Outcomes for Postoperative Delirium after Major Surgery in Elderly Patients. PLoS One 2015; 10.
doi:10.1371/journal.pone.0136071•Google Scholar
Aldecoa C, Bettelli G, Bilotta F, et al. Update of the European Society of Anaesthesiology and Intensive Care Medicine evidence-based and consensus-based guideline on postoperative delirium in adult patients. Eur J Anaesthesiol 2024; 41:81–108.
doi:10.1097/EJA.0000000000001876•Google Scholar
Chow WB, Rosenthal RA, Merkow RP, et al. Optimal preoperative assessment of the geriatric surgical patient: a best practices guideline from the American College of Surgeons National Surgical Quality Improvement Program and the American Geriatrics Society. J Am Coll Surg 2012; 215:453–66.
doi:10.1016/j.jamcollsurg.2012.06.017•Google Scholar
American Society of Anesthesiologists (ASA). Perioperative brain health initiative 2023. 2024;
Available: here
Google Scholar
Berger M, Schenning KJ, Brown CH, et al. Best Practices for Postoperative Brain Health: Recommendations From the Fifth International Perioperative Neurotoxicity Working Group. Anesth Analg 2018; 127:1406–13.
doi:10.1213/ANE.0000000000003841•Google Scholar
American Geriatrics Society Expert Panel on Postoperative Delirium in Older Adults. American Geriatrics Society abstracted clinical practice guideline for postoperative delirium in older adults. J Am Geriatr Soc 2015; 63:142–50.
doi:10.1111/jgs.13281•Google Scholar
National Institute for Health and Care Excellence (NICE). Evidence standards framework for digital health technologies. 2019;
Google Scholar
Hshieh TT, Yue J, Oh E, et al. Effectiveness of multicomponent nonpharmacological delirium interventions: a meta-analysis. JAMA Intern Med 2015; 175:512–20.
doi:10.1001/jamainternmed.2014.7779•Google Scholar
Godfrey M, Green J, Smith J, et al. Process of implementing and delivering the Prevention of Delirium system of care: a mixed method preliminary study. BMC Geriatr 2019; 20.
doi:10.1186/s12877-019-1374-x•Google Scholar
Hshieh TT, Yang T, Gartaganis SL, et al. Hospital Elder Life Program: Systematic Review and Meta-analysis of Effectiveness. Am J Geriatr Psychiatry 2018; 26:1015–33.
doi:10.1016/j.jagp.2018.06.007•Google Scholar
Zhao Q, Liu S, Zhao H, et al. Non-pharmacological interventions to prevent and treat delirium in older people: An overview of systematic reviews. Int J Nurs Stud 2023; 148:104584.
doi:10.1016/j.ijnurstu.2023.104584•Google Scholar
Janssen TL, Alberts AR, Hooft L, et al. Prevention of postoperative delirium in elderly patients planned for elective surgery: systematic review and meta-analysis. Clin Interv Aging 2019; 14:1095–117.
doi:10.2147/CIA.S201323•Google Scholar
Burton JK, Craig LE, Yong SQ, et al. Non-pharmacological interventions for preventing delirium in hospitalised non-ICU patients. Cochrane Database Syst Rev 2021; 7.
doi:10.1002/14651858.CD013307.pub2•Google Scholar
Dodsworth BT, Reeve K, Falco L, et al. Development and validation of an international preoperative risk assessment model for postoperative delirium. Age Ageing 2023; 52:1–10.
doi:10.1093/ageing/afad086•Google Scholar
Sadeghirad B, Dodsworth BT, Schmutz Gelsomino N, et al. Perioperative Factors Associated With Postoperative Delirium in Patients Undergoing Noncardiac Surgery: An Individual Patient Data Meta-Analysis. JAMA Netw Open 2023; 6.
doi:10.1001/jamanetworkopen.2023.37239•Google Scholar
Dodsworth BT, Reeve KA, Zozman M, et al. Benefits of an automated postoperative delirium risk prediction tool combined with non-pharmacological delirium prevention on delirium incidence and length of stay: a before-after analysis based on a quality improvement project. Age Ageing 2024; 53.
doi:10.1093/ageing/afae219•Google Scholar
Buchan TA, Sadeghirad B, Schmutz N, et al. Preoperative prognostic factors associated with postoperative delirium in older people undergoing surgery: protocol for a systematic review and individual patient data meta-analysis. Syst Rev 2020; 9.
doi:10.1186/s13643-020-01518-z•Google Scholar
Schuurmans MJ, Shortridge-Baggett LM, Duursma SA, et al. The Delirium Observation Screening Scale: a screening instrument for delirium. Res Theory Nurs Pract 2003; 17:31–50.
doi:10.1891/rtnp.17.1.31.53169•Google Scholar
Schlarmann JG. Der CMS© im ePA©, Verschiedene Qualitätsdimensionen eines Instruments Eine empirische Analyse. Private Universität Witten/Herdecke gGmbH, Fakultät Medizin, Institut für Pflegewissenschaft 2007;
Google Scholar
R Core Team. R: a language and environment for statistical computing. Vienna, Austria, R Foundation for Statistical Computing 2021;
Available: here
Google Scholar
Gavinski K, Carnahan R, Weckmann M, et al. Validation of the delirium observation screening scale in a hospitalized older population. J Hosp Med 2016; 11:494–7.
doi:10.1002/jhm.2580•Google Scholar
Wong CK, van Munster BC, Hatseras A, et al. Head-to-head comparison of 14 prediction models for postoperative delirium in elderly non-ICU patients: an external validation study. BMJ Open 2022; 12.
doi:10.1136/bmjopen-2021-054023•Google Scholar
Kim MY, Park UJ, Kim HT, et al. DELirium Prediction Based on Hospital Information (Delphi) in General Surgery Patients. Medicine (Baltimore) 2016; 95.
doi:10.1097/MD.0000000000003072•Google Scholar