What the Health Data Lab Is and How It Works
In recent years, Germany has taken important steps toward building a national infrastructure for health data that can support research, planning, and policy. One of the most ambitious initiatives in this direction is the Health Data Lab (Gesundheitsdatenlabor, HDL), managed by the Federal Institute for Drugs and Medical Devices (BfArM). The HDL acts as a secure environment where large sets of German health data can be accessed for well-defined, legally authorized purposes.
The HDL builds on a structure that first existed within the German Institute for Medical Documentation and Information (DIMDI). Since 2013, that unit had allowed the analysis of statutory health insurance claims data for research purposes. Following the dissolution of DIMDI in 2019, the Digital Healthcare Act prepared for the creation of the HDL, has assumed all functions of its predecessor under a clearer legal framework defined by the Data Transparency Regulation (DaTraV) and the Health Data Use Act (GDNG). Although the lab is institutionally based within BfArM, it operates independently, with its own governance and mandate.
At its core, the HDL collects and prepares data that originate mainly from the statutory health insurance system (Gesetzliche Krankenversicherung, GKV). This means that the data cover about 90% of the German population, offering an exceptionally broad view of healthcare in real life: outpatient and inpatient care, prescriptions, rehabilitation, and costs. Over time, the HDL is also expected to integrate additional sources such as the electronic patient record (ePA), expanding both the depth and the richness of information available for secondary use.
The HDL operates under strict legal and ethical frameworks. Every dataset is pseudonymized, and every request must go through a clearly defined procedure before data can be accessed.
Application for data usage
The process begins when a potential data user develops a research question that could be answered using the HDL data. At this early stage, applicants are encouraged to consult the HDL Data Catalog, an online interface that describes the available datasets, their structure, and variables. This helps determine whether the HDL actually holds the type of information required for the project.
Once the feasibility is confirmed, the applicant prepares a formal application through the HDL portal.
The application must include several key elements:
- A detailed description of the research question and the data required.
- The specific purpose of use (as defined in the Social Code Book V).
- The legal basis and justification for processing pseudonymized data.
- Information about the institution and responsible individuals.
Applications are submitted electronically, but this is only the first step. The HDL team at BfArM performs a formal review to verify completeness and coherence. If information is missing or unclear, the applicant is asked to revise the submission.
The Ten Purposes of Use
Every application to the HDL must be linked to one of ten purposes of use, defined in the German Social Code. These purposes determine what kinds of activities are legally permitted and reflect the broad vision behind the creation of the HDL: to enable data-driven improvements in healthcare, policy, and research while maintaining strict boundaries.
The ten purposes of use are the following:
- Fulfillment of steering responsibilities by collective agreement partners, the statutory health insurance funds and the Associations of Statutory Health Insurance Physicians.
- Improving the quality of care and safety standards. High-quality healthcare requires continuous evaluation. HDL data allow for assessments of treatment outcomes and help identify where quality assurance measures are most needed.
- Planning healthcare and long-term care resources, such as hospital capacity or nursing home availability. By analyzing hospital admission data, planners can anticipate demand and allocate resources more efficiently.
- Scientific research in health, nursing, and life sciences. Researchers can study chronic conditions over long periods, evaluate treatment effectiveness, and identify new therapeutic opportunities.
- Supporting political decision-making. Policymakers and parliamentary research services can use aggregated HDL data to inform health legislation.
- Analyses of cross-sectorial healthcare models, such as transitions from hospital to outpatient care. By tracking patient pathways researchers can assess whether continuity of care is working as intended.
- Health reporting and official statistics. These are essential for monitoring public health trends and supporting evidence-based prevention strategies. HDL’s comprehensive data provide an excellent foundation for national or regional health reports.
- Public health and epidemiological monitoring, such as tracking infectious disease trends or vaccination coverage. This can inform timely interventions, public education, and outbreak control.
- Development and safety monitoring of medicines, devices, and digital health applications. HDL data can reveal potential side effects, and evaluate treatment safety.
- Benefit assessments of medicines, medical devices, and therapeutic products. Beyond initial approval, these analyses help determine whether new interventions truly offer added value in real-world conditions or could be repurposed for other indications.
Legal and ethical review
Once the formal requirements are met, the application is evaluated in substance. This review is central to the HDL process and ensures that every data use aligns with the law and with the public interest. The HDL staff first examine whether the chosen purpose of use matches one of the ten legally defined categories. They also assess whether the requested data subset is proportionate to the stated objectives, in other words, that applicants are not asking for more data than they truly need.
After this internal assessment, the application may be forwarded to external committees when required by law, for instance, for additional data protection or ethical evaluation.
Contract and data provision
If the application is approved, the HDL prepares a data use agreement that outlines the conditions of access, data security measures, and responsibilities of the user. No raw data ever leave the HDL environment. Instead, authorized users receive access to a secure remote computing environment, where the HDL provides standardized analysis tools and statistical software, allowing researchers to run queries, generate descriptive statistics, or build models directly within the secure environment. Only aggregated, anonymized results can be exported, and these are checked before release.
This model ensures that data never circulate outside the controlled infrastructure, a fundamental principle of the HDL’s “data visiting” approach, where researchers come to the data rather than moving data around.
Access to the Secure Processing Environment and Conducting the Analysis
Within the “data visiting room” protected space created specifically for the approved project, researchers can log in remotely to perform their analyses.
This environment functions like a virtual computer, preloaded with the statistical tools and software required for analysis. Researchers develop and test their analysis instructions using small test datasets within the secure environment. These subsets allow them to fine-tune their code. Once these instructions are finalized, they are submitted to the HDL team, who then execute them on the full pseudonymized dataset. Researchers never access the complete dataset themselves. Instead, they receive only the aggregated statistical outputs generated through their own analysis instructions.
Review and Transmission of Results
Before releasing any results, the HDL team carefully reviews all output tables. Their goal is to ensure that the results are fully anonymized and that there is no risk of re-identifying individuals.
The approved results are then transmitted to the researchers, usually in the form of summary tables or statistical indicators. Should researchers wish to perform additional analyses, they must submit a new or amended application. This multi-step process ensures that every use of the data remains compliant with legal standards while still allowing for meaningful scientific discovery.
Timelines
From start to finish, the entire process can be estimated as follow:
- Pre-application stage (1–2 months): exploring the data catalog, defining the project scope, and possibly contacting the HDL helpdesk for methodological advice.
- Formal submission and review (1–3 months): checking completeness, revising missing details, and confirming eligibility under the law.
- Legal and ethical evaluation (2–4 months): internal and, if necessary, external committee reviews.
- Contracting and technical access (1–2 months): drafting the agreement, setting up the secure workspace, and testing the environment.
In total, the timeline may range from six months to a year, depending on complexity.
Closing thoughts
The Health Data Lab represents a turning point in Germany’s approach to secondary use of health data. It combines technical innovation with a strong ethical and legal framework. The process for obtaining access is detailed and scrupulous, reflecting a commitment to maintaining trust, a key condition for the long-term success of any national data infrastructure.
As the HDL continues to expand, integrating richer data from electronic patient records and potentially from new digital health sources, it will play an increasingly central role in shaping evidence-based healthcare in Germany. For researchers, policymakers, and healthcare planners alike, it offers not just a data source, but a foundation for smarter, safer, and fairer decisions in the years to come.
References and further reading
Data Usage at HDL. https://www.healthdatalab.de/data-usage/
Purposes of Use for data Utilizatio. https://www.healthdatalab.de/data/purposes-of-use/
Datensatzbeschreibung FDZ Gesundheit. https://fdz-gesundheit.github.io/datensatzbeschreibung_fdz_gesundheit/
Social Code Book V – Statutory health insurance. https://www.gesetze-im-internet.de/englisch_sgb_5/englisch_sgb_5.html
The electronic patient record (ePA) - https://gesund.bund.de/en/the-electronic-patient-record