Looking for OMOP-based rare disease datasets for research

Hello everyone,

I’m looking for information on existing OMOP-CDM rare disease datasets, or institutions maintaining such datasets, to support ongoing research.

In France, our team is developing an algorithm designed to reduce diagnostic delay for certain rare diseases by leveraging patient phenotypes extracted from medical narratives (clinical reports).
The algorithm is currently being evaluated on local French data, but we would like to test its performance on international datasets mapped to the OMOP CDM.

We are particularly interested in:

  • OMOP-CDM datasets enriched with rare disease cohorts (any disease area),
  • Databases incorporating HPO phenotypes, narrative text, or detailed clinical features,
  • Institutions or research groups willing to explore a collaboration around diagnostic support tools for rare diseases.

If you know of any relevant datasets, initiatives, or previous OHDSI work in this area, I would greatly appreciate your guidance.

Many thanks for your help!

Hi @MathildeFruchart

I’m not sure whether it is relevant, but I have previously developed and validated a phenotyping algorithm for rare endocrine diseases within the OMOP-CDM:

Digital Phenotyping of Rare Endocrine Diseases Across International Data Networks and the Effect of Granularity of Original Vocabulary, YMJ, 2025, https://doi.org/10.3349/ymj.2023.0628`