Do We Still Need Country-Specific Data?

In the European Union, data on racial or ethnic origin are classified as a “special category” under Article 9 of the GDPR, which generally prohibits their processing unless specific conditions are met, such as explicit consent or substantial public interest safeguards. As a result, several EU countries, including France, Germany, Denmark, and Sweden, do not routinely collect ethnicity information in health records or real-world data (RWD) sources. While the EU does not impose a total ban, national legal traditions and privacy sensitivities effectively limit the availability of such data, forcing researchers to rely on proxies like country of birth or nationality when analyzing health disparities.

 

Today, the statistics on first- and second-generation immigrants across Europe describe a picture of deeply transformed populations, both in the EU’s largest countries and in the Nordic region. In Germany and Sweden, around one-fifth of the population is foreign-born, while France, Spain, the Netherlands, and Denmark have between 13 and 17 percent, and Italy slightly lower at roughly 11 percent (1, 2). When second-generation residents (those born in the country to at least one foreign-born parent) are included, the share of people with immigrant backgrounds rises further, often reaching a quarter or more of the total population in some countries (1, 2).

 

These figures illustrate that European countries can no longer be accurately described as ethnically homogeneous; the populations captured in national registries are increasingly diverse, reflecting a mix of cultural, genetic, and social backgrounds. This diversification has significant implications for research using RWD: the “average patient” recorded in EMRs or claims databases is now a composite of multiple ethnicities and migration histories, yet datasets rarely capture these dimensions explicitly. As a result, analyses based solely on country-specific populations risk overgeneralizing or misrepresenting outcomes, because they assume a uniformity that no longer exists, and the people actually studied may differ significantly in genetics, behavior, and health determinants from the nominal national population.

 

In much of RWE research, we rely on country-specific datasets under the implicit assumption that the patients included reelects the local population. Historically, this made sense: populations in many European countries were largely stable, and national registries captured fairly consistent patterns of disease, treatment, and outcomes. Today, however, steady immigration and demographic diversification have fundamentally changed this picture. In such contexts, claiming that a country-specific dataset reflects an “exclusive” national ethnicity is increasingly inaccurate.

 

The implications are significant. If we continue to analyze country-level data as if the population were homogeneous we might risk overgeneralizing results that actually reflect a subset of residents, typically those well-integrated into the healthcare system and with complete records. Policy and treatment decisions may be biased, because subgroups with different responses or risk profiles are underrepresented or invisible. Cross-country comparisons become less meaningful, since heterogeneity within countries may now exceed differences between countries.

 

Does this mean we should stop considering country-specific data altogether? Not necessarily. Country-level datasets still have value for understanding healthcare utilization, system-level policies, infrastructure effects, and population health management. But we might need to reinterpret their meaning: they no longer describe a homogeneous ethnic or genetic population. Instead, they reflect the healthcare experiences of all residents, regardless of origin, with the limitation that some subpopulations may be partially captured or missing entirely.

 

In practical terms, RWD in Europe tends to over represent long-established, native-born populations with stable health records and underrepresent transient or recently immigrated populations. Patients with irregular access to care, language barriers, or gaps in documentation may not appear at all in structured datasets. This raises the question: are we studying “Europeans in general” or primarily those whose data are most consistently captured? This requires emphasizing stratification by origin, migration status, or socioeconomic proxies when possible, rather than assuming a uniform population. We might need to use federated or pooled datasets across countries to capture broader diversity and improve generalizability, recognize the limitations of “national” labels in describing health outcomes, and being transparent about the population actually observed.

 

In short, country-specific data remain useful, but they cannot be interpreted as ethnically or genetically “exclusive” anymore. National datasets now measure healthcare engagement, exposure to local health systems, and administrative capture, rather than a homogeneous population. Ignoring this change risks misleading conclusions and misapplied policy recommendations.

 

 

REFERENCES:

  1. Eurostat. Population of foreign-born residents in the EU, 1 January 2023. European Commission. Available at: https://ec.europa.eu/eurostat/web/products-eurostat-news/w/ddn-20240327-1
  2. Euronews. Born abroad, living in the EU: How migration shapes the EU’s population. 2025. Available at: https://www.euronews.com/my-europe/2025/02/19/born-abroad-living-in-the-eu-how-migration-shapes-the-eus-population

By Nadia Barozzi

Passionate about data-driven insights and the advancement of Real World Evidence research, drug safety and pharmacovigilance.