CDE Harmonization Documentation¶
Welcome to the CDE Harmonization project documentation. This project supports the harmonization of Common Data Elements (CDEs) to enable interoperability across clinical research studies.
Overview¶
CDEs are standardized data fields and survey questions with defined permissible values that facilitate consistent data collection. Despite their widespread use, CDEs remain fragmented across multiple repositories, lack semantic bindings to ontologies, and are often incompatible with one another—limiting cross-study integration and AI-readiness.
Our goal is to transform CDEs and study-specific variables into computable, semantically rich, and interoperable data assets using the LinkML framework, AI-assisted curation, and ontology-based standardization.
Documentation¶
Key Technologies¶
LinkML: Schema language for defining computable, semantically rich data models
SSSOM: Standard for representing and sharing ontology and CDE mappings with provenance
Ontologies: HPO (phenotypes), Mondo (diseases), LOINC (lab tests), OBA (biological attributes)
AI/LLM tools: CurateGPT, AI4Curation, and semantic embeddings for automated mapping and curation