Can Real-World Data Redefine Clinical Trial Protocol Design?

A System Built Backwards?

Clinical trials are the gold standard for evaluating new treatments, but they are often built on idealized assumptions that poorly reflect the complexity of real-world clinical care. Protocols are crafted around rigid eligibility criteria, theoretical endpoints, and often over-optimized designs that filter out the majority of patients seen in actual clinical settings.

 

What if we reversed this process? What if, instead of beginning with abstract hypotheses and arbitrary cutoffs, we began with the data, the actual profiles, outcomes, and trajectories of patients already treated in the real world? By anchoring trial design to what is already happening in clinical practice, we may release more efficient, inclusive, and feasible clinical research.

 

The Recruitment Crisis

Recruitment remains one of the most persistent and costly challenges in clinical research. Trials are regularly delayed or fail outright due to slow enrollment, high screen failure rates, and overly narrow inclusion/exclusion (I/E) criteria.

 

This isn’t a new issue, but it is getting worse. With increasing therapeutic complexity and precision targeting, protocol designs are becoming narrower just as real-world populations grow more heterogeneous. The result? Trials that are scientifically sound on paper but operationally condemned in practice.

 

The disconnect between protocol assumptions and clinical reality leads to massive inefficiencies. But these inefficiencies aren’t inevitable; they are, in part, design failures.

 

RWD as a Design Tool, Not Just a Secondary Source

Real-world data (RWD), from electronic health records (EHRs), insurance claims, registries, and even patient-reported data, is too often relegated to a supporting role: validation, safety monitoring, or post-marketing studies. But RWD true value may lie upstream, where trials are conceived.

 

Using RWD for protocol optimization enables us to define eligibility criteria based on the actual patient populations rather than on hypothetical ideal types. By analyzing which clinical phenotypes are most prevalent and identifying where they cluster, trial designs can more accurately reflect reality. It also allows for the selection of outcomes that truly matter in clinical practice; if a proposed endpoint rarely changes during routine care or is inconsistently documented, it may not be a meaningful measure. Furthermore, RWD helps in estimating event rates grounded in actual disease progression and treatment response, preventing the use of unrealistic assumptions in trial planning. Additionally, it informs site selection by revealing which locations and regions have sufficient patient density and treatment volumes to support timely enrollment.

 

This isn’t an abstraction: companies and regulators are already beginning to use trial simulation platforms and federated RWD networks to test protocol feasibility before a trial even begins.

 

Key players have emerged in this space, shaping how RWD is integrated into trial design. TriNetX offers global access to de-identified patient data alongside trial simulation capabilities, allowing protocol designers to evaluate how eligibility criteria affect recruitment and event capture1. Their AI-powered feasibility tools provide real-time insights into protocol impact across diverse populations worldwide1. Flatiron Health, with a strong focus on oncology, supplies disease-specific datasets that include longitudinal outcomes and biomarker information, which support trial designers in aligning protocols with clinical reality2. Their datasets enable the creation of external control arms (ECAs) and the selection of pragmatic endpoints that better reflect patient outcomes2. Aetion specializes in producing regulatory-grade real-world evidence (RWE) and collaborates with pharmaceutical companies and regulators to assess feasibility and endpoint relevance3. Their Aetion Evidence Platform models clinical outcomes under varying protocol scenarios using RWD3. Additionally, Veradigm, and OM1 utilize AI-driven insights from RWD to refine trial planning and identify the most appropriate patient cohorts4,5. OM1’s clinical datasets simulate inclusion criteria and predict enrollment success, helping to optimize trial execution5.

 

These platforms have moved beyond peripheral tools to become central in the modern conceptualization of clinical trials. At the core of their capabilities lies the use of machine learning (ML) and artificial intelligence (AI), which analyze complex, high-dimensional data to uncover patterns in heterogeneous patient populations and identify hidden clusters of disease phenotypes or treatment responses. These technologies also anticipate recruitment bottlenecks and suggest trade-offs in eligibility criteria, while ranking potential endpoints based on their availability, clinical relevance, and statistical power. Rather than replacing clinical expertise, AI and ML augment it by providing data-driven foresight previously unattainable. Techniques such as natural language processing (NLP) extract key eligibility markers from unstructured clinical notes, while predictive modeling simulates risks such as patient dropout and real-world medication adherence patterns, further enhancing the precision and applicability of trial designs.

 

Addressing the Bias Critique

Critics often argue that incorporating RWD into clinical trial design introduces bias, suggesting that reliance on observational data may compromise the scientific integrity of studies. However, this critique fundamentally misunderstands the role that RWD can and should play in the research process. Designing trials with RWD does not imply selecting outcomes or eligibility criteria simply because they appear convenient or favorable in existing datasets. Rather, it means basing research questions and trial parameters in the realities of clinical practice, patient heterogeneity, and care delivery patterns. This alignment enhances the relevance and applicability of trial results to real patient populations, rather than detracting from scientific rigor.

 

The real threat to the validity and usefulness of clinical research arises when trials are conceived in isolation from the patient populations and treatment contexts they intend to serve. Protocols based on narrow or idealized assumptions may yield scientifically sound data under controlled conditions but ultimately produce findings that lack generalizability or fail to address meaningful clinical questions. In other words, trials disconnected from real-world practice risk generating results that are valid statistically but irrelevant to everyday care.

 

It is crucial to distinguish between allowing RWD to guide the formation of hypotheses and letting it dictate the interpretation of results. RWD should be used primarily to inform study design decisions, such as eligibility criteria, endpoint selection, and feasibility assessments, helping researchers ask the right questions. Once a trial is underway, rigorous prospective methods, including randomization and controlled data collection, remain essential to ensure unbiased and reliable conclusions. In this way, RWD acts as a powerful complement to traditional trial methods, enhancing the relevance and efficiency of research without compromising scientific standards.

 

Why We’re Not Ready — Yet

Despite the clear promise of integrating RWD into clinical trial design, significant resistance remains, slowing widespread adoption. One major obstacle is regulatory conservatism. Regulatory agencies, tasked with safeguarding patient safety and ensuring scientific rigor, tend to be cautious when considering shifts in established standards. Even when the potential for improved efficiency and trial relevance is evident, regulators often require extensive validation and evidence before embracing new approaches, creating a lengthy and uncertain pathway for change.

 

In addition, organizational silos within the pharmaceutical and clinical research ecosystem present formidable challenges. Clinical operations, data science teams, and medical affairs departments frequently function in isolation, each with their own priorities, languages, and workflows. This fragmentation slows down the necessary collaboration required to effectively integrate complex real-world datasets into trial protocols and decision-making processes. Without a concerted effort to foster cross-functional alignment, the full potential of RWD-driven trial design remains suboptimal.

 

Compounding these issues are persistent problems with data quality and interoperability. RWD originates from diverse sources such as electronic health records, claims databases, and registries, each with varying formats, completeness, and reliability. The data is often noisy, incomplete, or biased, necessitating extensive cleaning, standardization, and transformation before it can be meaningfully analyzed. Moreover, the lack of interoperability between systems and data silos further complicates data aggregation, limiting the scope and accuracy of insights.

 

Finally, cultural inertia acts as a powerful barrier. Many stakeholders in clinical research remain deeply attached to traditional trial paradigms, which have been the cornerstone of drug development for decades. Even as the limitations of these conventional methods become increasingly apparent, such as slow recruitment, low generalizability, and operational inefficiencies, there is reluctance to abandon familiar frameworks. Change implies risk and uncertainty, which can be uncomfortable for organizations accustomed to established processes and regulatory expectations.

 

While these barriers are significant and deeply rooted, they are not insurmountable. Momentum is building as regulatory bodies gradually update guidance to incorporate RWD, multidisciplinary teams are forming to bridge functional gaps, and advances in data technology improve quality and integration. Cultural shifts are slowly occurring as early adopters demonstrate the value of data-driven trial design, setting precedents that encourage broader acceptance. The path forward is challenging but increasingly navigable, heralding a future where real-world data plays a foundational role in clinical research.

 

A New Paradigm for a New Era: Let Reality Lead

We do not need to sacrifice scientific rigor to achieve greater relevance in clinical research. The objective is not to replace traditional clinical trials but to enhance and refine them, making them more efficient, inclusive, and truly reflective of the populations they aim to serve. By integrating RWD at the earliest stages of trial design, we can develop protocols that accurately represent real-world patient populations rather than idealized subsets. This approach enables us to anticipate recruitment challenges before they arise, allowing for proactive adjustments that improve enrollment and reduce delays. Moreover, selecting endpoints grounded in outcomes that matter most to patients and clinicians ensures that trials measure meaningful benefits, enhancing the applicability of results to everyday clinical practice.

 

Incorporating RWD also facilitates greater diversity within study populations, improving the generalizability of findings across demographic and clinical spectra. This, in turn, accelerates the pace of research by reducing inefficiencies and costly trial failures. Essentially, the shift moves clinical research from an aspirational ideal, where protocols are rigid and often disconnected from reality, to an achievable model that embraces complexity and variability.

 

Clinical research must evolve to reflect the current ongoing dynamics. By reversing the traditional paradigm, starting with RWE rather than solely theoretical constructs, we can build smarter, faster, and more inclusive trials that serve both science and patients more effectively. This is a needed transformation. It is time to abandon the outdated notion that patients must conform to protocols and instead design protocols that fit the diverse realities of patients’ lives and clinical care.

 

 

References:

1. TriNetX: Leverage Real-World Data to Optimize Clinical Trial Design, Site Selection, and Patient Identification. https://trinetx.com/clinical-trial-design-optimization/#s_0

2. Flatiron Health: Emulating control arms for cancer clinical trials using RWD https://resources.flatiron.com/publications/control-arms-rwd

3. Aetion: https://aetion.com/services/rwe-study-design-and-execution

4. Veradigm: https://veradigm.com/evalytica-real-world-evidence-analytics/

5. OM1: Polaris by OM1: Clinical Trial Recruitment for Faster, Cost-Effective Trials https://www.om1.com/resource/polaris-by-om1-clinical-trial-recruitment-for-faster-cost-effective-trials/

By Nadia Barozzi

Passionate about data-driven insights and the advancement of Real World Evidence research, drug safety and pharmacovigilance.