Health data



Healthcare statistics and records

There two key types of health care data:

  • Health statistics - this includes internationally comparable health data (collected mainly by WHO, the OECD or Eurostat through questionnaires to member countries) or National statistics from surveys and/or key information systems in the Healthcare system
  • Health records - data originating from information systems within the health system. These can be at a very granular level (e.g., a reimbursable act in claims/payment systems, a hospital discharge record or an overall patient health record)

The distinction is clearly made on the list below, as the latter provides the potential for detailed medical and outcomes research and is the main focus of this list. Sources of health records in the united states are not included.

Health records: Europe

European level health records:

  • IMS Health databases (LifeLink, Disease analyzer etc.)principally covering primary case settings and office based specialist across France, Germany, Italy and the United Kingdom
  • Cegedim LPD (Longitudinal Patient data)principally covering primary case settings and office based specialist across France, Germany, Italy, Spain, and the United Kingdom

International registries:


Health Records

  • Various source from sick (Insurance) funds in Germany:
    • German Pharmacoepidemiological Database GePaRD-BIPS database. It is a health insurance database covering approximately 17% of German population from 4 different sickness funds
    • Otherwise certain sick funds are fairly active in driving the use of their records for research, including WIDO. Wissenschaftliches Institut der AOK (WidO). The Research Institut of the AOK, Gmunder ErsatzKasse (GEK), Betriebskrankenkassen (BKK) and Techniker KrankenkasseClaims database
    • Selected commercial entities, like Herescon or DtD (Data to decision) have ability to analyze sick fund data for different purposes
  • Primary care (general practice):
    • IMS Health IMS DA Germany comprises longitudinal patient data in primary care interventions for nearly 5 million patients
    • Cegedim LPD (Longitudinal Patient data) covers around 680,000 patients through GPs

Health Statistics

  • Federal Health Monitoring statistics (in English) and underlying data search on Genesis query platform(in German). Includes
    • Overall sysmtem (Population, Births, Households, Families, Social Conditions, Health Personnel, Insurance coverage, cause of death)
    • Health status (Mortality; Ailments; Disabilities, Absenteeism etc.)
    • Life-Style (Risk factors including Smoking, Prevention/early diagnosis, Vaccinations, Environment etc..)
    • Diseases/Health Problems - incidence and mortality by major disease (CVD, Cancer etc.)
    • Health Care System (Public health, Healthcare personnel and facilities, Pharmaceutical Supply including deep-dives on selected classes such as insulin, medical procedures)
    • Health Expenditure (by care setting/disease, and overview of financing)


Health Records

  • National Dataset project incorporating Databases of National Insurance Fund CNAMTS. This dataset is accessible by any member of the Institut des Données de Santé (institute on health data) and includes:
    • EPIB-AM (Échantillon Permanent Inter-regimes des Bénéficiaires de l'Assurance Maladie)together with EPAS contains a standard sample of 1/100 patients of all health insurance schemes
    • PMSI covers in-patient hospital admissions and comprises medical diagnoses and procedures performed. Useful aggregated versions of discharge records to assess healthcare resource utilization through version of the PMSI such as the PMSI-MCO
    • Full extraction of claims(SNIIR-AM) for the last 2 years including linkage to hospital discharge (PMSI) and cause of death register(CépiDc-INSERM) are possible but only on special request for a member of the IDS (such as the national cancer instiute)
  • Primary care (general practice):
    • Cegedim LPD (Longitudinal Patient data) covers 1500 GPs and a selection of office based specialists
    • IMS Health IMS DA France comprises longitudinal patient data in primary care interventions for 1.1 million patients provided by 540 doctors.
    • L’Observatoire de la Médecine Générale is a French GPs network covering 600 GPs and 700,000 patients. As of 2012, the network is suspended pending resolution of funding issues
  • Cohorts: Constances is a project set to go-live in 2013 to create a large prospective cohort. It will contain diverse and quailty health information, drawing information from permanent databases CNAV (French retirement), SNIIR-AM, PMSI, and the death register (CépiDc-INSERM)
  • Disease specific or registries:
    • Calliope is a permanent registry using EMRs. The "Alzheimer’s" Calliope launched in 2005 is used in 74 centres (CMRR and CMP) with close to 400 users for nearly 30,000 patients
    • CépiDc-INSERM is the national death register and can be linked to other databases
    • HIPPOCRATE anonimysed medical information of patients with chronic diseases
  • Surveillance databases (not longitudinal):
    • Sentinelles is a French GPs network cover about 2% of GPs that captures 8 health indicators (seven infectious diseases and one non-infectious indicator)

Health Statistics

  • IRDES, the French Institute for research in health economics provides different types of data:
    • Overall indicators on health status of the population, health expenditures, health professions, hospital, health care insurance
    • Global indicators as on the demography or economy
    • Self-reported population health surverys (Self perceived health status, Public coverage and private supplementary health insurance, visits to a physician, Consumption of medical goods and services, etc.)


Health Records

  • National research platforms:
    • The CPRD (Clinical Practice Research Datalink) is the new English NHS observational data and interventional research service, which when completed with encompass many of the resources below for 50m lives in England
    • Scotland has a similar linked dataset for 5m lives, which is under continuous development as part of National Clinical Datasets Development Programme
  • Primary care (general practice):
    • QRESEARCH based on 660 general practices (using the EMIS clinical computer system)
    • GPRD based on 639 general practices
    • IMS Health databases (LifeLink, Disease analyzer etc.) based on around 130 general practices
    • Cegedim LPD (Longitudinal Patient data) or THIN, based on 479 general practices
  • Secondary care (hospital):
    • HES, the Hospital Episodes Statistics national data warehouse provides a range of information on the care provided by NHS hospitals in England and for NHS hospital patients treated elsewhere.
    • IMS Hosptial Treament Insights connects hosptial pharmacy to discharge records for over 1.2, lives, and will eventually extend to more patients and link to primary care.
  • Other databases:
    • RCGP WRS Royal College of General Practitioners Weekly Returns Service, is a surveillance datbase covering 100 practices or approx. 900,000 patient (returning GP-based diagnoses presented as incident rates per 100,000)
    • The Oxford Record Linkage Study uses probabilistic matching to create extensive and longitudinal care records for about 5m lives
    • A number of regions enter specific collaborations such as Tayside/MEMO (with a population of 400,000), Manchester Academic Health Science Centre MAHSC (Manchester SHA's collaboration with the pharmaceutical industry on research), etc..

Health statistics

  • General resources:
    • NHS Information Centre Provides reports based on NHS records including a mix of statistics and access to specific health records. Encompasses themes including, Audits and performance, Health and lifestyles, Hospital care, Mental health, Population and geography, Primary care, Screening, Social care, Workforce or Facilities
    • Dr Foster Provides reports based on NHS records
    • APHO, the association of public health observatories operates a wide range of specific datasets (e.g., Cardiovascular Disease Profiles, Community Mental Health Profiles, Disease Prevalence Estimates etc.)
  • Other:
    • An English GP practice dataset, published by the NHS Information Centre is generated the by English NHS. Records are highly detailed allowing extensive research possibilities despite the lack of patient level records
    • HQIP maintains a list of registries in England
    • IMS HPAI: IMS Health Hospital Pharmacy Audit Index (IMS HPAI) database which covers pharmaceutical usage by brick


Health records

  • ARNO Observatory: ARNO observatory covers almost 10 million people viq offering Italian Local Health Units (LHU) a Clinical Data Warehouse to aggregate data collected for administrative use for a single patient. It includes general practitioner’s prescriptions, hospital admissions and discharges, diagnosis tests and examinations.
  • Health Search Database (HSD). Originally part of the Research unit of the Italian College of General Practitioners that aggregates clinical information contributed by Italian GPs, now managed by Cegedim. Approximately 700 GPs with 2+ million active patients
  • PEDIANET. 105 primary pediatricians that currently provide data to the database

Regional Health records

  • The administrative in Italy are available in some regions on hospital care, outpatient specialist care, rehabilitation services, either in outpatient departments or at home, nursing homes and pharmaceutical care. For example:
  • DB Topografico Regione Emilia Romagna (RER) of Italy (in Northeast Italy) contains records for the entire region or approximately 4 million residents. This covers demographics, hospital discharge summaries, ICD-9_CM coded diagnosis and procedure codes, outpatient pharmacy, payments and patient co-payments, specialty care use (lab, diagnostics, therapeutic procedures, visits to specialists), home health data (physician, nurse, therapist, etc.); physician information and ad hoc registries of specific medical devices or surgical procedures. Cause of death is recorded
  • In lombardy the Osserra Database has in & out paitnet covering 5m lives, and includes all reimbursed hospitalizations, all reimbursed specialist visits and diagnostic tests or procedures, all reimbursed drug prescriptions (prescriptions during hospitalization are excluded)
  • The regional database from Tuscany covers around 3.5m lives (demographic information, hospital discharges, cause-specific mortality, diagnostic analysis prescriptions, specialist visit prescriptions and drug dispensings). The databases are linked through a personal unique identifier (fiscal code).
  • Other regions have similar levels of advancement in data capture (e.g., Piemonte, Veneto, Lazio...) but are less well setup for research. For instnace, Lazio is able to use various datasets (discharge, mortality, etc.) covering 5.6m lives, but are note linked
  • Smaller datasets existist such as theGP/claims database Arianna database in the Caserta in Camapina region (900k lives)

Health statistics

  • OsMED database Italian Medicines Agency covering in/outpatient drug utilization for 100% of population


Health Records

  • Regional systems:
    • Catalonia runs a series of regional databases covering 6m lives including prescribing information. This includes SIDIAP (primary care), CMBD-AH (Hospital discharge database) with information an diagnoses and therapeutic procedures occuring during admission to any of the hospitals in Catalonia, Deaths (Health Department)and selected registries ( Cancer registries, The Catalan Registry of Arthroplasties (RACat), etc).
    • Other regions have similar level of sophistication in underlying data capture (e.g., Andalusia, Valencia, Pais Vasco etc.) but differing levels of availability for data for research
  • Clinical and claims:
    • BIFAP Non-profit research database operated by the Spanish Medicines Agency. Covers 969 GPs and 191 Pediatritains from 9 autnomous health regions. Data is updated twice yearly and covers 4m active patients. Still limited number of related scientific publications at this time
    • IASIST has access to 2m lives for in-patient data via protoclized research
    • Cegedim LPD (Longitudinal Patient data) covers around 320,000 patients through GPs

Health statistics


Health records

  • Integrated Primary Care Information (IPCI) information from electronic patient records of 150 GPs covering more than 1 000 000 patients
  • Pharmo Independent research organization for drug use and outcomes (including cardiovascular, metabolic disease, oncology and autoimmune disease, respiratory disease, and mother and child health). Overall it covers 2 million residents in the Netherlands and around 200 000 patients linked to GP patient records. Other data includes:
    • Community Pharmacy database (CPD)
    • Clinical Laboratory File (CLF)
    • General Practitioner database (GPD)
    • Dutch Pathology Registers (PALGA)
    • Hospital Pharmacy database (HPD)
    • Dutch mortality statistics (CBG)
    • National Dutch Hospital Registration (LMR)
    • Perinatal Registry (PRN)
    • Eindhoven Cancer registration (IKZ)
  • AGIS is an isnurance database cover 1.2m insured persons

Health statistics

  • GIP database from Health Care Insurance Board (Free online) covering outpatient drug utlization for 85% of population

Nordic countries

Health Records

  • Medical records or claims:
    • In Sweden:
      • Pygargus operates electronic medical records in Sweden and the other Scandinavian countries, inducing being able to link with other resources (e.g., registries) using unique patient commonly used in nordics
      • Most university centres that are also ethics centres (Gothenburg. Linköping, Lund, Stockholm, Umeå & Uppsala) have varying levels of record linkage capabilities
      • the CEBRxA database combines data from a health resource utilization database for inhabitants in the greater Gothenburg area, and is operated by IMS health
    • In Denmark:
      • Odense Pharmacoepidemiological Database (OPED) covers catchment areas of approximately 1.2 million people since 2007 and verious clinical information. Can be linked to other DBs in Denmark
      • Aarhus record linkage program has create a unique integrated dataset covering 1.8m lives
    • National Discharge Registers are commonly available, e.g., National Institute for Welfare and Health (THL) in finland or Danish National Patient Register (NPR)
  • Nordic Registries. There are extensive registries in the nordics that have been used for extensive research. For example:
    • Pharmacoepidemiological Prescription Database in Northern Denmark (1,7 million patients).
    • Norway: NorPD - Norwegian Prescription Databases covers all drugs that are dispensed by prescription in Norway, updated annually, allowing general statics by user, value and dosage
    • Sweden: Prescribed Drug Register, covers the entire Swedish population and includes approximately 82% of all Defined Daily Doses (DDD) dispensed in Sweden (excludes OTC, does not fully cover hospital/vaccines)
  • A more extensive list of Swedish registries is produced by the Swedish association of local authorities
  • In Finland, the Bureau for Register Studies updates a list of all available registers

Health Statistics

  • Denmark: total sales of medicines in Denmark.
  • Finland: Kelasto Statistical database covering various social and healthcare indicators

Other European countries

  • Austria:
    • IMS Health IMS operates a primary care panel similar to other countries in Europe.
    • Statistics Austria collects person level health care data(e.g. on mortality) and is able to link hospitalal utilization (hospitals maintain the “minimum basic data set” (MBDS) for inpatient stays which includes patient information, diagnoses and treatments and provides the basis for reimbursement) and Hauptverband (main association of social insurance funds) data
  • Belgium:
    • IMS Health IMS operates a hospital database in Belgium containing individual patient/admission-level data on diagnoses, procedures, and pharmaceutical products.
    • Cegedim LPD (Longitudinal Patient data) for primary care
  • Estonia:
    • Estonia has a fairly sophisticated health information (EHR, ePrescription) and claims systems. The Health Insurance Fund (EHIF) administrative database can be accesses for research (covering 94% of the 1.3m population) with data since 2000
  • Hungary:
    • The National Health Insurance database of Hungary is increasingly used in research, with near total population coverage

Special datasets and major registry projects (EU focus)

  • EU-ADR is a project combing primary care databases (Health Search/CSD Patient Italy, Primary Care Information IPCI Netherlands, Pedianet Italy, and QRESEARCH UK and multi-care setting databases ( Aarhus University Hospital Database Denmark, PHARMO Network Netherlands, and the regional Italian EHR databases of Lombardy and Tuscany). 20-30 million active lives are covered.
  • SOS is a multi-country observational study will be designed across primary care databases (Health Search/CSD Patient Italy, Primary Care Information IPCI Netherlands, Pedianet Italy, and QRESEARCH UK and multi-care setting databases ( Aarhus University Hospital Database Denmark, PHARMO Network Netherlands, the regional Italian EHR databases of Lombardy and BIPS Institute claims data). Up to 35 million active lives are covered.
  • EUROCARE. Monitoring cancer survival in Europe. 13 million of cancer diagnoses from 1978 to 2002, with follow‐up for vital status ascertainment until the end of 2003 from 93 population based cancer registries in 23 European countries. Being extened to 2008. RARECARE has similar goals for more rare cancers
  • EUBIROD is a European project to create a Diabetes cohort of 500,000 patients from 22 different countries, and is building on thye BIRO project that created a common open source registry data extraction system used to coordinate registry extraction for diabetes across countries.
  • SHARE is covers 45,000 individuals from age 50+ and explores very specific factors around the ageing
  • ECHO (2010 – ongoing)international effort to bring together the hospital databases of several European countries and make the data available via an online summary/comparison tool
  • EuroReach Is a EU sponsored project to map in detail patient data resources in the EU. Results are due in 2013 thouhg preliminary results are already online

There are many other efforts that are disease specific and extend beyond Europe (e.g., ISAAC for asthma, 463 801 children aged 13‐14 years in 155 collaborating centres in 56 countries)

International health statistics


  • WHO Statistical Information System (WHOSIS): Statistics for WHO's 193 Member States, tpyically updated up to 2-3 years ago covering 70 core health statistics.
  • European Observatory on Health Systems and Policies: comprehensive and rigorous analysis of the dynamics of health care systems in Europe.
  • WHO Europe HFA-DB: Covers most European countries, with 600 indicators:
    • Demographic (Fertility, unemployement, labour force etc.)
    • Mortality (specifically highlighting diseases such circulatory, heart, cerebrovascular, types of cancer, diabetes etc. )
    • Morbidity (Incidence rates by disease, hospital utlization and discharge rates, Absenteeism, risk factors, etc.)
    • Environment (Emissions etc.)
    • Health care resuurces (Health professionals by type etc.)
    • Health care utlization (Admissions, length of stay, bed occupancy, surgical prcedures, expediture by type etc.)
    • Mqternal and child health (Live births etc.)
  • European detailed mortality database(DMDB) mortality data by ICD code back to 1990.
  • European hospital morbidity database (HMDB) comparison of morbidity and hospital activity patterns based on hospital-discharge data by detailed diagnosis, age and sex, since 1999.
  • Mortality indicator database (MDB) Age/sex-specific analysis of mortality by broad disease groups, as well as disaggregated to 67 specific causes of death. Data reach back to 1980.


  • Eurostat follows a number of expediture, resource-related (human, physical and technical resources) and output-related data (hospital patients, procedures). The website has good intriduction to data also.


  • OECD. Updated annualy with instant access for 11+ countries and the following indicators:
    • Health expenditure (Total, Public, Out-of-pocket, Pharmacuetical)
    • Health care resources (Physicians, Nurses, Beds, etc.)
    • Health care utlization (Consultations, Vaccination rates, MRI exams, discharge rates, length of stay etc.)
    • Mortality and risk factor (Smoking etc.)

Other sources

  • J. Kaiser Family Foundation (KFF)), global data on HIV/AIDS, malaria, TB and other key health and socio-economic indicators
  • SHARE: - Survey of Health, Ageing and Retirement in Europe is across-national panel database of micro data on health, socio-economic status and social and family networks of more than 55,000 individuals from 20 European countries aged 50 or over.
  • National Health & Wellness Survey (NHWS): international survey of patient reported outcomes across 165 therapeutic areas covering France, Germany, Italy, Spain, and the United Kingdom
  • The TNS European Healthcare Access Panel has screens more than 170000 adults in six European countries, with information about more than 80 disease conditions and captures patient-level information on their health status. Positioning of dataset for more for market research
  • EUHealth contact data Names and addresses of GPs, practice managers and nurses in Europe (6 countries)

Links and resources

  • ENCePP Database of Research Resources, part of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP) profiles approximately 50 major European data sources
  • ISPOR digest of country databases with particular focus on Pharmacoeconomics and Outcomes Research, nearly 400 databases are profile but most are highly specilist rather than population based
  • Bridge to data sells a list of databases with basic profiling information developed by the data source owner, ocvering 220 profiles world-wide
  • US National Library of Medicine selection of health data resources (selected for their quality, authority of authorship, uniqueness, and appropriateness)


Benjamin Hughes's Professional profile Benjamin Hughes's Social profile