Realizing the dream of evidence at scale

Patrick_Ryan · April 20, 2026, 12:22pm

Friends:

Today at the OHDSI Europe Symposium, @RenskeLos and I shared a dream for our community: to produce reliable evidence at scale, for all medical interventions, for all health outcomes, across all databases around the world. I asserted that now is the time for us to collaborate to turn this dream into a reality, but also asked the audience to see who shares this dream with us and how can everyone can contribute. I look forward to hearing your perspectives and working together on this journey.

sween · April 20, 2026, 2:02pm

This is my second attempt at this community, and I approached it wrong before and now with clearer heads stepping into the European symposium. My approach to this lends to how I would like to go “global” by going small.

First thing we are going to do is learn OHDSI with a workload with something fairly straightforward without the noise of political boundaries, economic constraint, etc (basically something we can control soup to nuts) but checks all the boxes (at least technically), and connecting a few Whoop’s that create a daily visit job from passive personal health information and populate a CDM and go through the journey without duplicating ANYTHING on the way there tool wise. I am sure there are existing workgroups on this subject which engaging them all the way is part of the deal… we do have a tangible condition in mind around menopausal transition and hot flashes, but give us a break we just re-started the journey last Friday.

But

Thinking even smaller, and in the spirit of going globally I thought it would be great to create a mock “study” of zero clinical relevance and go global with it with the 5000 + “company” that we are and make it less about the clinical relevance to the evidence and more about the global collaboration around the “evidence” with a global outcome, even if the mock study is something from a children’s book, but goes through all the plumbing on the left and right of the CDM.

The other thing I see would be great to augment a “forum” is a place where people can post “articles” … I am a member of a few communities, and am part of one for my company I contract with (example: vibing synthea modules for OMOP but giving people a community voice beyond a forum (which sometimes is perceived as Q and A) they can approach with somewhat polished content with individual and organizational identity year round outside of the collaborative in person stuff… and when we get together, work on the hard stuff. I mean you said 5 (now 4 years), that is how many collaborator meetings? 8? This one is easier said then done, but personally gain a lot from contributing to it and referencing the stuff others do outside of github repos and confluence posts.

I am probably under thinking things, new to this battle and admittedly an 80% person, so I see this as a superpower sometimes, but you dont want me dry walling your living room for sure… this is my reply, there are many others like it, but this one is mine.

anthonysena · April 21, 2026, 9:30am

To realize Patrick’s dream, I proposed that as a community we define what it means to be ready to run federated network studies at scale. From a technical perspective, HADES provides a powerful set of tools for generating reliable evidence at scale. I proposed defining a community “readiness study” that each site can run to know, at a technical level, they can run HADES. We can then determine what gaps exist in HADES or what other technical barriers exist at each site. @sween I support your idea of a synthetic data set for this purpose.

At the OHDSI Europe Symposium, collaborators expressed concerns around constraints other than the technical: access to data, governance, and infrastructure sizing to name a few. These are important topics as well and I’m sure there are lessons learned in this community that we can share (and its likely these lessons have been shared and that I’ve missed them).

RenskeLos · April 24, 2026, 7:39am

I absolutely share this dream and I’m deeply committed to helping turn it into reality.

I don’t come to this primarily as a technologist, but as someone who believes that scalable, trustworthy evidence only becomes truly meaningful when it connects to people, practice, and purpose. As co‑lead of OHDSI NL and OHDSI Europe, my role has been to bring communities together, align priorities across countries and disciplines, and make sure that what we build collectively responds to real questions in healthcare.

One area I’m particularly passionate about is expanding the data types we work with, including patient‑reported outcome measures (PROMs), so we can better capture impact for multiple stakeholders: patients, clinicians, providers, and health systems, not just researchers and regulators. If our evidence does not reflect what matters to patients or informs day‑to‑day clinical decisions, we are only solving part of the problem.

For me, contributing to this dream means actively bringing OHDSI closer to clinical practice: engaging clinicians early, listening to patients, and co‑creating evidence that is usable, trusted, and actionable in real care settings. It also means helping translate complex methods and results into shared understanding, so that more people can participate meaningfully in this ecosystem.

Reliable evidence at global scale will only happen if we build not just strong tools, but strong bridges, between data and care, between science and practice, and between all those who have a stake in better health outcomes. I’m very excited to continue building those bridges together.

@Patrick_Ryan thank you for sharing this dream with us, now let’s move it forward!

Sanjay_Udoshi · April 27, 2026, 1:02am

Patrick, Renske —

Count me in on the dream. I’ve been thinking about this same question from a slightly different angle: what would it take for a new site — an academic medical center, a registry, a national health system — to go from “interested in OHDSI” to “contributing reliable evidence to a network study” in days rather than quarters?

A few observations from working on the problem:

The tooling surface area is the activation energy. Atlas, WebAPI, Achilles, DQD, ARES, ATHENA, HADES, Strategus — each is excellent on its own, and the community’s methodological work on PS diagnostics, negative controls, and empirical calibration is the gold standard. But the seams between tools (separate auth, separate deployments, separate UIs, separate mental models) impose a real cost on reproducibility, on onboarding, and on time-from-question-to-evidence. I’ve been building Parthenon as a bet that consolidating that surface — one application, one auth model, one data plane on OMOP CDM v5.4, with cohort building, characterization, HADES analytics, AI-assisted concept mapping, and Strategus orchestration in one place — meaningfully lowers that activation energy. We’re keeping it aligned with the OHDSI standards stack and one-command-installable for exactly that reason.

Evidence at scale needs a supply chain, not just a factory. At the volumes you’re describing, the provenance of every result — CDM version, vocabulary release, phenotype definition, analysis specification, DQD profile, source-mapping decisions — has to travel with the result automatically, at every site, for every study. The methods are settled; the operational discipline to ensure those checks ride along reliably across hundreds of heterogeneous sites is, I think, where the next phase of work lives.

The long tail matters disproportionately. If the dream is “all databases around the world,” the marginal cost of a new site joining has to drop by an order of magnitude. That’s largely an installer-and-defaults problem — and exactly the kind of work that benefits from being done once, in the open, by the community rather than re-solved by every new participant.

Happy to demo what we have, contribute upstream to HADES/Strategus where it’s useful, or align on whatever the community sees as the highest-leverage next step. Thank you both for the call to action — it landed.

— Sanjay M. Udoshi, MD Founder, Acumenus.io

LiesbetPE · April 28, 2026, 2:51pm

Hi all,

Patrick’s dream is also my dream.

What I will be doing in the upcoming weeks and months to support this is:

continue to co-lead the OHDSI Belgian chapter: OHDSI Europe - Belgium
organise a tutorial during the OHDSI global community to help the community members to champion on behalf of the community (learn more here: OHDSI2026 Tutorials – OHDSI)
coordinating end-to-end real-world-evidence studies
support the following consortia with the development and implementation of the data strategy with the OHDSI principles in mind (Banking the Brain - federated network in Belgium for neuroscience data, Data4PHM - population health management, Health Campus Limburg, …)
connecting with policy makers in Flanders and Belgian to push towards the shared mission.
etc.

here to help

MaximMoinat · May 1, 2026, 2:19pm

To realize Patrick’s dream, I proposed that we create a framework to assess the fitness for purpose of a particular dataset for a particular study. This would benefit us to a) speed up feasibility assessment such that we choose the right datasets among our large data network for many studies and b) avoid delays during studies discovering a dataset is not fit.

As a community we have become very good at assessing fitness for use; we have several tools that help to characterise and identify data quality issues on a dataset level (Achilles, DQD, CdmOnboarding). However, before adding a dataset to a study we want to know if the issues will affect our study, and if the data has the population we need. For this, we need a system that takes as input e.g. the study protocol, study design, dq results, characterisation, ETL document, governance procedures, etc. The output would be a fitness for purpose assessment, with a recommendation if this dataset is expected to produce reliable (and timely) results for this study.

Ilse-Vermeulen · May 5, 2026, 2:27pm

Reading through this thread, what strikes me is how aligned we already are: not just on the ambition, but on the urgency behind it.

At the Europe Symposium, @Patrick_Ryan’s “dream” was not abstract. It was framed through very real situations where reliable evidence simply wasn’t available when it mattered most: as a caregiver, a patient, or a clinician. That’s what makes this feel less like a vision and more like an obligation.

What also stood out is that the gap is no longer primarily scientific or technical. As a community, we already have the methods, tools, and foundations. The real question is whether we can organize ourselves to apply them systematically and at scale.

At the same time, I think we can (and should) be honest with ourselves. This thread was started during the Symposium, and two weeks later, engagement is still relatively limited for a topic of this importance. I say that fully including myself: I intended to contribute earlier and didn’t, even while being very closely involved through the OHDSI Europe Coordinating Center. That alone is probably a signal.

If even highly engaged people respond late (or never), what does that say about how easy it is to participate?

Even small things, like struggling to log into the forum (which I also experienced), can break momentum. These may seem minor, but at scale, they matter.

Before suggesting anything new, I do want to acknowledge the amount of work already happening. The many Working Groups, and the people supporting coordination, infrastructure, and communication behind the scenes, are what make this community function. A lot of that work is incremental, often invisible, and sustained over time. People like @Craig_Sachson and Elisse Katzman (and many others) play a crucial role in keeping things moving.

From my perspective, the main friction is not in building components, because we’re strong there! It lies in connecting them in a way that is usable, sustainable, and easy to engage with across contexts.

If we want to realize evidence at scale, we should probably think just as deliberately about enabling contribution at scale .

Building on this, I’ve been thinking about a few concrete, low-effort ways we could move forward… Happy to share them with you!

Andy_Kanter · May 5, 2026, 8:49pm

I guess I would say that at a time when the world seems to be ripping itself apart with countries trying to one-up each other and define their health, security or general wellbeing in opposition to other countries… the OHDSI model of shared methods, tooling and federated, but shared, data is a welcome relief. I think understanding health as an emergent property of an inherently interrelated and interdependent system is exactly the sort of tonic we need today.

I think it is not just the “what” we are doing, or even only the “how” we are doing it, but it is the “who” we need also to consider. The people, the ones who are here now, and the multitude of those still to come (and participate)… that may be the lasting contribution of the OHDSI community.

I cannot say that I have any special insight into how to make our community more impactful. I have tried to focus on expanding the collaboration to include those in LMICs who might be otherwise overlooked by the existing power structures (in research, care, AI, everything). I know we are making some really good progress in that area, so perhaps some of the use cases which drive our impact will arise from there.

I think we are in challenging times, so I would not be too disheartened by people being distracted. At least for me, it is participation with my colleagues in OHDSI which continues to give me hope for the future, and I think it is almost inevitable that its impact will be felt… because something great has to grow from something with that does have the right motivation and really smart people from so many different places.

schuemie · May 6, 2026, 7:11am

We have already come a long way to achieve Patrick’s dream. Over the years we’ve achieved:

Standardization of data through the OMOP CDM and Vocabularies
Standardization of analytics, implement in HADES, including
- Automated approaches to confounder adjustment
- Formal pre-specified study diagnostics (including negative controls)

Allowing use to generate evidence both at a very high quality and at a very large scale, as demonstrated for example in our LEGEND studies, but also on a smaller scale in our J&J COVID vaccine safety surveillance.

Patrick’s proposal takes scale to a next level. To realize this dream, I see two main remaining challenges:

OHDSI data network preparedness. This includes data quality, as @MaximMoinat mentioned, but also better mapping, and ensuring everyone has an adequate compute infrastructure.
Phenotyping at scale. Here, generative AI can really help, both in developing and in evaluating phenotypes.

I’m looking forward to working with everyone to meet these challenges!

Hitesh_Kumar · May 6, 2026, 12:16pm

I tried to run Achilles but it is not giving result as it should show in an interactive dashboard. Could you please help me to get desired result from Achilles.

MaximMoinat · May 6, 2026, 7:09pm

Hi @Hitesh_Kumar, please raise your issue on the Achilles Github, or open a new Forum post where you can give more details on what you have tried and what you are trying to achieve.

Cynthia_Sung · May 12, 2026, 9:38pm

In my experience, the ETL process and setting up an environment for the OHDSI analytical tools are major hurdles when trying to introduce a new data partner to the OHDSI community. Most of the people I’ve interacted with do not have budget to hire an outside company. When it takes more than a year to do this for a newbie, and they are on a timeline to demonstrate analytical outcomes in a year, they get frustrated and go back to the way of coding directly from source data. How can the time and effort to do the ETL be decreased?

Would it b possible to merge clinically reviewed mappings of source vocabulary to standard concept ids from 4000+ collaborators to build a knowledgebase for a better semantic mapping tool?
With AI, is it possible to automate Rabbit-in-a-Hat mapping (or some other way to visually represent where source elements go in the OMOP CDM) and then check for conformity to THEMIS conventions?
With AI, would it be possible to use those structural mapping instructions to produce annotated SQL code for an OMOP CDM instance.
I don’t have the technical skills to build these mapping “engines”, but happy to be a beta-tester!

Patrick’s interview resonated with me because of a similar experience I am having. A family member looks up all the side effects of any prescribed drug and becomes overwhelmed and almost paralyzed about whether to consume it. He often takes half the dose at a lower frequency than required, then tells the doctor the drug isn’t working, and he needs to try another one. It would be great to build visuals using RWD that puts those adverse events in a more balanced context against the benefits.

floriankatsch · May 15, 2026, 8:37am

I’m approaching this from the perspective of someone who is still relatively new to the OHDSI community and to secondary use of health data in general. There are three areas in particular where I would like to raise awareness and contribute myself:

Deployment needs to become easier.
Setting up the full OHDSI tool stack can be challenging, especially within the constraints and requirements imposed by healthcare institutions. Clearer deployment guides, best practices, and practical checklists could help many organizations adopt and maintain the OHDSI tool ecosystem more effectively.
Vocabulary and mapping development need better tooling.
Too often, concept mapping is still managed through spreadsheets and Athena alone. This makes it difficult to properly document mapping decisions and their underlying rationale, leading us to revisit and question previous work months later. A dedicated platform for collaborative mapping, with defined workflows, versioning, documentation, and review capabilities, could make the process more transparent, efficient and enjoyable.
Exploit the current (European) window of opportunity.
As the European Health Data Space (EHDS) continues to take shape, and with many implementation details of EHDS II still evolving, this is an important moment to actively position the OMOP CDM as a standard for the secondary use of EHR data. We should also continue to include additional data modalities (e.g. patient-contributed health data, sensor data, survey data, …), transition from RxNorm toward ISO IDMP (once available) and promote OHDSI tools as reliable solutions for analysis within the Health Data Access Bodys (HDAB). Building on existing capabilities such as CDM inspection and DQD, we are able to offer value to data holders and the HDAP by e.g. automatic generation of metadata (including e.g. Quantum’s health data quality labels).

I’m really looking forward to the coming years

katy-sadowski · June 6, 2026, 4:18pm

To turn this dream into a reality, I believe that we need to improve the reliability and transparency of the OMOP ETL process. Flawed ETLs can lead to systematic bias in OHDSI studies, but we currently lack the tools to holistically assess ETL quality and prove that data integrity has been maintained throughout the ETL process.

ETL developers deserve a toolkit that brings HADES-level rigor and transparency to the data transformation process. Let’s straighten out this squiggly line we see in many OHDSI presentations!

I strongly believe that ETL should not be different for everyone. We need open standards for data quality evaluation throughout the ETL process, not just at the very end. We can and should leverage AI technology to build and evaluate ETL pipelines against these standards.

In order to drive investment in such solutions, I also believe that we need a framework linking pipeline design decisions and quality metrics to evidence reliability outcomes. Such a framework could be constructed by leveraging the amazingly rich knowledge base we’ve built up across the hundreds of institutions that have implemented OMOP around the world.

I am optimistic that all of the above is achievable and look forward to collaborating with the community to make it happen. I’m excited to see several posts in this thread that share this motivation to improve OMOP ETL

PS, I also agree with @floriankatsch that there’s an encouraging tailwind right now, especially in Europe, with EHDS, QUANTUM, etc. We should not shy away from collaborating outside of OHDSI on these challenges, which apply across any sort of secondary health data use, not just OMOP.