The increasing reliance on Real World Data (RWD) to generate Real-World Evidence (RWE) is transforming clinical research, regulatory decision-making, and treatment approval pathways. However, the utility of RWD can vary depending on the therapeutic area. While some fields, such as cardiovascular diseases and diabetes, benefit from structured and routinely collected data, other areas such as oncology, neurology, psychiatry, dermatology, and rare diseases often face challenges due to fragmented, incomplete, or inconsistently captured information. Additionally, even within a well-documented therapeutic area, the possibility of using RWD depends on whether the research question aligns with the strengths and limitations of the available data. For example, studies assessing treatment adherence or healthcare utilization may be well supported by claims data, whereas those requiring detailed clinical endpoints, biomarker information, or disease progression metrics may struggle with data gaps. Similarly, oncology has a wealth of structured data for common cancers, yet the availability of key variables such as tumor staging, biomarker results, and treatment sequencing varies widely depending on the specific study objectives and data source. These disparities have significant implications for RWE research, leading to unequal benefits in using RWD across therapeutic areas. To maximize the potential of RWD, researchers must carefully align study design with data availability, consider complementary data linkage strategies, and acknowledge the inherent limitations that may affect the robustness of findings in different therapeutic contexts.
Let’s explore in more details some example of how this may impacts different disease areas:
Cardiovascular and Metabolic Diseases (e.g., Diabetes, Hypertension, Dyslipidemia)
In the cardiovascular and metabolic therapeutic area while many aspects of patient care are well documented, others remain largely unrecorded. Routine monitoring through lab tests, imaging, and prescriptions is consistently captured in electronic health records (EHRs) and claims data, providing valuable insights into disease management and treatment patterns. Conditions such as hypertension and diabetes are closely tracked in primary care, making them well suited for observational studies. Furthermore, large-scale registries, such as those for diabetes and stroke, may offer comprehensive datasets that enhance research efforts. However, despite these strengths, critical gaps persist. Laboratory values are not always captured in secondary data source (e.g., claims data). Lifestyle factors that play a significant role in cardiovascular and metabolic diseases, including diet, physical activity, and smoking habits, are often missing or inconsistently recorded. This lack of structured lifestyle data limits the ability to fully understand disease onset, progression, and response to treatment in real-world settings.
Oncology
The availability of RWD in oncology is highly uneven, with some aspects of cancer care being systematically documented while others remain incomplete or inconsistently recorded. Routine monitoring through lab tests, imaging, and prescriptions is well captured in EHRs, providing insights into treatment patterns and disease progression. Additionally, large-scale cancer registries exist, offering structured data on tumor characteristics, staging, and survival outcomes. However, significant challenges persist in the completeness and granularity of oncology RWD. While targeted therapies and immunotherapies have transformed cancer treatment, key clinical variables such as biomarker status, tumor histology, and treatment sequencing are often missing or inconsistently reported outside of clinical trials. Real-world performance status, disease progression markers, and adverse events are frequently unstructured, making it difficult to assess patient outcomes comprehensively. Furthermore, lifestyle factors that influence cancer risk and treatment response, such as smoking history, diet, and environmental exposures, are rarely captured in routine healthcare data.
Psychiatry (e.g., Alzheimer’s, Parkinson’s, Epilepsy)
Also in psychiatry the availability of RWD in is highly uneven, with certain aspects of mental health care well documented while others remain incomplete or inconsistently captured. EHRs and claims data provide structured information on prescriptions, psychiatric diagnoses, and hospitalizations, making them valuable sources for studying treatment patterns and healthcare utilization. Additionally, some large-scale registries exist for specific conditions such as schizophrenia and major depressive disorder, enabling longitudinal research. However, significant gaps remain in the granularity and completeness of psychiatric RWD. Unlike other therapeutic areas, psychiatry relies heavily on subjective assessments, and key clinical variables such as symptom severity, functional impairment, and disease progression are often recorded in unstructured clinical notes rather than standardized fields. Furthermore, real-world treatment response and adverse effects may be underreported, as medication adherence and side effects are not consistently documented outside of controlled settings. Another major limitation is the lack of comprehensive lifestyle and social determinants of health data, including stress levels, support systems, and socioeconomic factors, which play a critical role in mental health outcomes.
Rare Diseases
The availability of RWD in rare diseases is highly fragmented, with substantial gaps that limit its use for research and clinical decision-making. EHRs and claims data can capture routine monitoring, diagnostic procedures, and treatment patterns, but their utility is often constrained by the small patient populations and the variability in disease presentation. While some rare diseases benefit from dedicated patient registries, these are often limited in scale, geographic coverage, or long-term follow-up. A major challenge in rare disease RWD is the delayed or missed diagnosis, as these conditions are often under recognized and may require genetic testing or specialized assessments that are not systematically recorded. Treatment data is also incomplete, particularly for off-label therapies, compassionate-use treatments, or interventions received in specialized centers that may not share standardized data with broader healthcare systems. Additionally, key clinical outcomes such as disease progression, functional status, and quality of life are often absent or inconsistently captured, making it difficult to assess long-term treatment effectiveness.
Dermatology
In dermatology many aspects of skin disease management are well documented while others remain incomplete or inconsistently recorded. EHRs and claims data capture routine prescriptions, dermatologic procedures, and diagnoses, making them useful for studying treatment patterns and healthcare utilization. Additionally, certain large-scale registries exist for chronic conditions such as psoriasis, atopic dermatitis, and melanoma, providing longitudinal insights into disease progression and therapeutic outcomes. However, significant gaps exist in dermatology RWD. Objective clinical measures such as lesion size, severity scores (e.g., PASI for psoriasis), and physician-assessed disease progression are often recorded as free-text rather than structured data, limiting their use in large-scale analyses. Unlike other therapeutic areas, dermatology relies heavily on visual assessments, yet high-quality, standardized image data are not consistently integrated into EHRs or claims databases. Furthermore, patient-reported symptoms such as itching, pain, and quality of life critical factors in dermatologic conditions are inconsistently captured and many dermatology products are purchased over the counter and are not captured in secondary data sources. Lifestyle and environmental factors, including sun exposure, skincare routines, and occupational exposures, also remain largely absent from routine healthcare data.
How to Close the RWD Gap?
Closing the gap in data inequality across therapeutic areas requires a multifaceted approach that enhances the completeness, consistency, and representativeness of RWD. One key strategy is the integration of diverse data sources, such as linking EHRs with disease registries, imaging databases, and patient-reported outcome measures. This approach allows for a more comprehensive view of patient journeys, capturing both clinical and lifestyle factors that influence disease progression and treatment outcomes.
Standardization of data collection is another critical factor. Encouraging the routine documentation of disease-specific metrics, such as tumor staging in oncology or severity scores in dermatology, can improve the comparability of RWD across studies. The adoption of structured fields in EHRs, rather than reliance on unstructured physician notes, can ensure that essential clinical details are consistently recorded and accessible for analysis. In parallel, advancements in natural language processing (NLP) and artificial intelligence (AI) can help extract meaningful insights from existing free-text data, bridging gaps where structured information is missing.
Beyond clinical documentation, patient-reported outcomes and lifestyle data must be more systematically integrated into RWD sources. Chronic conditions such as cardiovascular and metabolic diseases are influenced by diet, exercise, and stress levels, yet these variables are often absent from traditional healthcare datasets. Digital health tools, including wearable devices and mobile health applications, offer an opportunity to capture real-time patient behaviors, adding a new dimension to RWD. Encouraging patient engagement in data sharing through user-friendly platforms and ensuring data privacy protections can enhance both participation and data quality.
Regulatory and policy-driven efforts also play a crucial role in addressing data disparities. Mandating the inclusion of key clinical and patient-reported variables in EHR systems, along with fostering interoperability between databases, can improve data completeness and accessibility. Additionally, research initiatives aimed at improving data collection in underrepresented populations can help mitigate biases and ensure that RWD is reflective of diverse patient groups.
Ultimately, bridging the data inequality gap requires collaboration among healthcare providers, researchers, technology developers, and policymakers. By leveraging emerging technologies, refining data collection standards, and prioritizing patient-centered data strategies, the reliability and applicability of RWD can be significantly enhanced, leading to more informed clinical and regulatory decision-making
In conclusion, the current imbalance in RWD availability poses significant challenges for research equity across therapeutic areas. Addressing this issue requires a multi-faceted approach, including improving data collection infrastructure, leveraging new technologies, and adopting hybrid study designs. By actively closing these gaps, we can ensure that all patients—regardless of their condition—benefit from the advancements in RWE generation.