Introduction
Multimorbidity (MM), the presence of two or more long-term conditions in an individual, is associated with reduced quality of life,1 increased healthcare needs, hospitalisations and premature mortality.2–4 In 2017, an MM index, M3, was developed and validated for 1-year mortality risk from all adult New Zealand (NZ) residents using 61 long-term condition categories from routine hospital admission data.5
This index estimated that MM affected approximately 8% of the NZ general population, with around 91% of affected individuals aged 65 years or older.2 International evidence indicates a strong link between socioeconomic factors and MM. For instance, systematic reviews indicate that lower education is associated with a roughly 64% increased risk of MM, while individuals in the lowest income bracket are over four times more likely to develop MM compared with those in the highest income bracket.6 Moreover, findings from Scotland, based on primary care records, demonstrate that the onset of MM occurs 10–15 years earlier in the most deprived populations, with deprivation particularly linked to MM involving mental health disorders.7 However, because the M3 index relies on International Classification of Disease 10th edition Australian Modification (ICD-10-AM) diagnostic coding following hospital admissions, it under-represents long-term conditions that are primarily managed in general practice settings such as dietary controlled diabetes mellitus, gout, dementia or angina.8
For this study, ‘general practice data’ refer to the electronic health records (EHRs) maintained by NZ general practices. These records capture comprehensive patient information through both structured coding systems (mainly Read codes9) and unstructured free-text entries. Recent analyses using general practice EHRs have highlighted these limitations. For example, a study of 454 367 general practice patients in NZ10 found that 24% of individuals diagnosed with ischaemic cardiovascular disease by their general practitioners (GPs) had had no corresponding hospital admission. Similarly, data from the QRISK3 GP database in the UK revealed that 23% of angina and 55% of transient ischaemic attack outcomes were solely documented in GP records.11 While extraction of structured fields can be performed accurately and consistently, manual extraction from unstructured free-text is labour-intensive, prone to human error and presents significant challenges, especially in large-scale studies of diverse chronic diseases.12–14
We aimed to develop a robust procedure for extracting both structured and unstructured data from general practice EHRs in NZ. This approach will facilitate research comparing MM representation between secondary and general practice settings, quantifying MM in the community and investigating the association of MM with cardiovascular disease outcomes.