The Expanding Role of Epidemiology in Modern Real-World Data Research

When observational retrospective research began to be used more systematically more than twenty years ago, the real-world data (RWD) landscape was relatively simpler. Not necessarily easier, but simpler. A limited number of large administrative and clinical databases dominated the field. Claims databases, national registries and a handful of electronic medical record (EMR) sources were directing the strategies for most of the studies. The main challenges were access, governance and methodological rigor, but the data ecosystem itself was stable and well defined.

 

In that context, the work of an epidemiologist or pharmacoepidemiologist was largely self-contained. Once access to a database was secured, the professional could independently define the research question, assess feasibility, design the study, conduct the analysis and interpret the results. Technical considerations existed, of course, but they were mostly subordinate to methodological ones. Data structures were familiar, vendors were few, and expectations around speed and scalability were modest compared to today.

 

Over time, this landscape has changed significantly. The growth of electronic health records (EHRs), the digitalization of healthcare systems, and the increasing availability of non-traditional data sources have multiplied both the volume and the diversity of RWD. What was once a small set of well-known databases has become a fragmented ecosystem composed of hundreds of data vendors, each offering different types of data, levels of curation, geographic coverage and update frequency.

 

At the same time, data vendors have evolved from being simple data providers into complex service organizations. Today, many vendors do not just sell data; they offer platforms, analytics environments, common data models (CDMs), natural language processing  (NLP) solutions, artificial intelligence (AI) tools, and end-to-end study support. The border between data ownership, data processing and analysis has become increasingly blurred. This has created new opportunities, but also new layers of complexity.

 

One of the most significant consequences of this evolution is that feasibility and data assessment have become much more demanding. Twenty years ago, feasibility often meant verifying patient counts, basic variable availability and follow-up time within a single database. Today, feasibility involves navigating a crowded marketplace of vendors, understanding differences in data provenance and refresh cycles, evaluating linkage capabilities, assessing unstructured data components, and determining whether proprietary algorithms meaningfully add value or introduce bias.

 

The promise of advanced technologies such as NLP and AI has further transformed expectations. There is now an assumption that unstructured data can be rapidly structured, that phenotyping can be automated, and that insights can be generated faster than ever. In practice, these technologies require careful validation, close collaboration between domain experts and technologists, and a clear understanding of their limitations. Without this, they risk becoming black boxes that obscure rather than clarify the underlying evidence.

 

As the data landscape has evolved, so too have the professional profiles required to work effectively within it. The epidemiologist of twenty years ago could afford to be methodologically excellent and technically self-sufficient within a relatively narrow domain. Today, methodological expertise remains essential, but it is no longer sufficient on its own. In many organizations, RWD research now sits at the intersection of epidemiology, data engineering, regulatory, informatics and technology strategy.

 

Modern RWD professionals must be able to engage critically with data vendors and technology providers. They need to understand how platforms are built, what assumptions underpin automated pipelines, how data transformations occur and where quality risks are introduced. Of course, they do not need to code everything themselves, but they do need to ask the right questions and interpret technical answers in a scientifically meaningful way.

 

This shift has given rise to hybrid profiles that resemble data strategists as much as traditional epidemiologists. These professionals bridge scientific objectives and technical execution. They translate research questions into data requirements, evaluate whether a given architecture can support a study design, and anticipate operational limitations that could delay timelines or compromise results. Their value lies not only in analysis, but in decision-making upstream of analysis.

 

This evolution has not yet been fully reflected in how many organizations define roles and expertise. Corporate structures often remain anchored to outdated profiles, separating scientific, technical and regulatory functions too rigidly, while placing disproportionate emphasis on AI capabilities as a standalone solution. In practice, success in RWD research depends less on the presence of advanced algorithms than on the ability to integrate methodological judgment, data understanding and governance awareness. Without rethinking role definitions and expectations, organizations risk investing heavily in technology while underestimating the expertise needed to use it responsibly and effectively.

 

The change has also affected how studies are operationalized. With multiple vendors, platforms and stakeholders involved, coordination has become a central challenge. Data flows must be aligned, roles clearly defined and expectations managed from the outset. Without strong operational oversight, even well-designed studies can suffer from delays, rework and misalignment between scientific intent and technical implementation.

 

At the same time, the regulatory environment is becoming an increasingly central dimension of RWD work. The upcoming European Health Data Space (EHDS) implementation will fundamentally reshape how health data are accessed, linked and reused across Europe, introducing new governance models, access pathways and compliance requirements. As a result, RWD professionals will need to integrate regulatory literacy into their core skill set, not as a downstream constraint but as a design parameter from the outset of a study. Understanding data permits, secondary use conditions and cross-border rules will be as critical as understanding data models or analytic methods.

 

Importantly, this evolution has increased the need for rigor. The abundance of data and tools can create a false sense of confidence, where speed is prioritized over understanding. The risk is that methodological shortcuts are justified by technological sophistication. In reality, strong governance, transparent assumptions and careful feasibility work are more critical than ever.

 

Looking back, the transition from a small number of large databases to a complex, service-oriented data ecosystem reflects the maturation of the RWD field. What began as an extension of academic observational research has become an integral part of regulatory, commercial and clinical decision-making. This maturation requires new skills, new ways of working and a broader view of what it means to generate evidence from RWD.

 

For professionals entering or evolving within this space, the challenge is to understand how to expand the role of epidemiology. With AI implementation advancing, the future of RWD research might belong to those who can combine methodological depth with technical literacy and operational insight. In a landscape defined by constant change, this ability to navigate complexity is no longer optional; it is the core competency.

By Nadia Barozzi

Passionate about data-driven insights and the advancement of Real World Evidence research, drug safety and pharmacovigilance.