In the evolving landscape of evidence generation, Real-World Data (RWD) has emerged as a vital source of insights across the product lifecycle. However, when it comes to pediatric populations, the generation of Real-World Evidence (RWE) has its unique challenges, both scientific and structural. Children are not just "small adults"; they represent a highly heterogeneous group with distinct physiological, developmental, ethical, and data-related considerations that demand a purpose-built approach to research and data governance.
Why Pediatric RWD Matters
The pediatric population is historically underrepresented in clinical trials, regulatory submissions and post-marketing research. Ethical concerns, smaller sample sizes, difficulty in recruitment, and developmental variability are often an obstacle for traditional randomized controlled trials (RCTs) in children. As a result, RWD offers a potentially powerful complement, enabling insight generation from routine care, capturing long-term safety, and informing label expansion or dosage optimization in real-world settings. However, these opportunities come with a set of methodological and operational complications.
Key Challenges in Pediatric RWD
1. Lack of Individual Insurance Records
One of the most critical challenges in pediatric RWD arises from healthcare coverage dynamics. In many countries, children are covered under a parent's insurance policy (typically the mother). As a result children may not have distinct patient identifiers in administrative claims data. Medical events, prescriptions, and services may appear as family-level entries, making attribution to the individual child difficult. In databases where a dependent’s age or sex is not disaggregated, longitudinal tracking and cohort building become severely limited. This introduces misclassification bias and threatens the validity of patient-level inference, especially when studying rare conditions, birth outcomes, or longitudinal safety.
Even in health systems that collect mother-baby linkage (such as Nordic registries), the linkage may be strong only for specific datasets and may not cover more granular or behavioral health variables. Moreover, father-baby linkage is even more limited or entirely absent in most datasets. This poses a significant challenge when exploring paternal genetic predispositions, environmental exposures, or socioeconomic influences tied to the father that could be relevant in studies of neurodevelopmental disorders, congenital anomalies, or mental health trajectories. The absence of this linkage can mean missed insights, especially when examining the impact of paternal age, substance use, or familial clustering of some conditions (e.g., autism spectrum disorder (ASD) or attention-deficit/hyperactivity disorder (ADHD)).
2. Rapid Physiological Development and Heterogeneity
Children undergo rapid physiological and developmental changes across infancy, childhood, and adolescence. This results in different pharmacokinetics and pharmacodynamics across age groups, variations in dosing, efficacy, and adverse events and the necessity to stratify analyses by fine-grained age bands (e.g., neonates, infants, toddlers, school-age, adolescents). In short: time-varying covariates matter much more in pediatric RWD, but many datasets lack the granularity to support this.
3. Data Fragmentation and Loss to Follow-up
Children often transition between providers and insurance plans, especially during key life events (e.g., birth, schooling, adolescence). This leads to a weak longitudinal continuity of care data, particularly in commercially insured populations. Also, linkage across care settings (e.g., hospital, primary care, emergency) may be incomplete and the transition from pediatric to adult care systems (e.g., around age 18) creates data discontinuities.
4. Ethical and Privacy Barriers
Conducting RWD studies in children raises sensitive ethical considerations. Parental or guardian consent is required in many contexts, even for retrospective data use. Regulatory frameworks (e.g., GDPR, HIPAA) impose stricter rules for pediatric data, limiting sharing and linkage and the use of data for secondary purposes (e.g., research) must account for assent from minors as they age. These constraints can reduce access to large-scale or multicenter pediatric datasets, especially in cross-border studies.
5. Lack of Pediatric-Specific Coding Standards
Diagnosis codes (ICD), procedure codes (CPT), and drug dictionaries (e.g., ATC) are often not pediatric-specific. A diagnosis like "asthma" or "epilepsy" may manifest differently in children, but coded identically. Off-label drug use is common, especially in neonates, but poorly captured. Growth metrics and developmental milestones, critical pediatric outcomes, are rarely recorded in structured data fields. Thus, phenotyping in pediatrics is particularly complex, often requiring manual chart review or NLP from unstructured clinical notes.
Strategies to Overcome Pediatric RWD Barriers
1. Use of Mother-Child Linkage Datasets: where available, mother-child linked data (e.g., from birth registries, perinatal records, or integrated health systems) can help:
- Link medication exposure during pregnancy with birth and neonatal outcomes.
- Track early-life health trajectories, including congenital anomalies and developmental delays.
- Map temporal associations between maternal health (e.g., diabetes, infections) and child outcomes.
Countries like Sweden, Norway, and Denmark offer robust registry-based linkage models that are gold standards for pediatric RWE. However, these datasets still often lack father-child linkage, which remains a gap in understanding paternal contributions to early health determinants. Including paternal data would be crucial in future database designs to enable a fuller understanding of inherited risks and familial context.
2. Selecting Fit-for-Purpose Data Sources: Certain databases are better suited for pediatric research:
- Pediatric-focused EHR networks (e.g., Pedianet in Italy, PEDSnet in the U.S.)
- Large integrated health systems with family-level coverage models
- National health registries with universal coverage and continuity of care
Additionally, school health records, immunization registries, and home visit programs may offer complementary data streams.
3. Developing Pediatric-Specific Algorithms: tailoring phenotyping and analytic algorithms is critical. This includes:
- Age-specific comorbidity indices and severity scores
- Pediatric growth and development measures (e.g., z-scores, BMI percentiles)
- Algorithms to detect off-label use or contraindications in age groups
Machine learning approaches can aid in inferring outcomes or exposures not directly recorded in the data.
4. Combining Data Modalities: linking structured data (e.g., claims, prescriptions) with unstructured sources (e.g., clinical notes, registries) improves:
- Accuracy of condition identification
- Capture of lab values, vitals, and growth metrics
- Documentation of social determinants (e.g., family structure, income)
Looking Ahead: The Future of Pediatric RWE
Global regulatory agencies are increasingly recognizing the importance of pediatric data. The FDA’s Pediatric Research Equity Act (PREA) and EMA’s Pediatric Regulation mandate pediatric investigation plans, which could be supplemented by RWD studies.
New models, such as digital health platforms, wearables for children, and parent-reported outcomes, may further enhance pediatric data collection in real-world settings. Combined with privacy-preserving linkage technologies and AI-powered phenotyping, these innovations offer hope for unlocking the full value of RWD in child health.