OHDSI Home | Forums | Wiki | Github

Reasons to OHDSI's OMOP CDM?


(Erica Voss) #1

Recently I was asked what are some “selling points” or “value proposition” to going to the OMOP CDM. Here are some things I thought about but I am interested in what others think!

  1. Ability to pursue cross-institutional collaborations (everyone has the same format, one code set could run fro everyone).

  2. If you have multiple DBs you can write one program to run on multiple data assets instead of a program for each DB. This is important for performing studies but even more important if you want to design/implement standardized methods for handling the data (e.g. patient level prediction).

  3. Accelerated research due to easy access to de-identified data that is mapped onto standard vocabularies and increased capability to do analysis due to access to free tools.

  4. The use of the OMOP Vocabularies has greatly increase our ability to find relevant codes for research (e.g. if you needed to find a drug by NDC, in the old days you’d find a handful of codes from the Internet, now you can easily find thousands by just searching from ATLAS or ATHENA).

  5. This paper gives some reasons:
    Voss EA, Makadia R, Matcho A, Ma Q, Knoll C, Schuemie M, DeFalco FJ, Londhe A, Zhu V, Ryan PB. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J Am Med Inform Assoc. 2015 May;22(3):553-64. doi: 10.1093/jamia/ocu023. Epub 2015 Feb 10. PubMed PMID: 25670757; PubMed Central PMCID: PMC4457111.

  6. You truly know your data if you convert it to the CDM. You literally have to touch and process every aspect of your data while if you’ve only done studies you’ll only interact with portions of the data.

  7. As you learn about your data you can correct issues with it. For example if you know for a period of time there was some issue with your data that needs to be handled in a certain way, someone who is knowledgeable on this issue could help design a standardized approach for dealing with it and implement it into ETL process.

  8. Standardization of analysis using the Population Level Estimation and Patient Level Prediction tools in ATLAS. This allows you to quickly and consistently design studies using best practices. Also helps you keep from wasting time by automatically generating protocols and final results documents.

  9. You have the whole community to lean on for performing research. OHDSI is open source and as a group we are interested in working together to help get the answers that patients and doctors need to make the best decisions in patient care.


(Patrick Ryan) #2

Thanks @ericaVoss for starting this post. This seems to be a regular discussion for newcomers in the community, or those who buy into the OHDSI premise but don’t know how to convince their management to gain support to implement a solution.

I agree with all of your points, well stated. My additional two cents:

If your organization has a vested interest in learning from observational data by generating and using evidence to inform decision-making processes, then you want that evidence generation and dissemination process to be:

  • higher quality
  • more consistent
  • faster speed
  • lower cost
  • greater throughput
  • larger impact

The upfront investment to adopt the OMOP common data model as your approach to standardized the data, and to learn the OHDSI toolkit as your approach to standardize analytics, can enable all of these characteristics.

Quality: the process of data standardization to the OMOP CDM forces greater understanding of the source data, and explicit decision-making about how to reconcile poor quality data. But by imposing a data quality process upfront in the management of the data, you reduce the burden on the analyst and reduce the risk that the ‘known’ data quality issues are overlooked. As @jon_duke presented at the EMA’s meeting on ‘A common data model in Europe? Why? Which? How?’ last year, OHDSI is trying to approach quality holistically, supporting four dimensions: data validation, software validation, methods validation, clinical validation. Developing a quality system requires that you first recognize that you need to develop a system - with defined inputs and defined outputs, and then you can develop practices and processes to ensure that system achieves the desired objectives. Our objective is generating reliable evidence that improves health by promoting better health decisions and better care, and OMOP CDM and OHDSI tools aim to facilitate that objective.

Consistency: A major challenge facing observational research is that the journey from source data to evidence is an arduous one, with many steps along the way which can make it challenging to retrace your path. Data standards, using the OMOP common data model, helps to provide a consistent data structure (tables, fields, data types), consistent conventions for governing how the data are represented (which is getting better and better thanks to our THEMIS workgroup), and consistent content through the use of standardized vocabularies. Standardized analytics allow for consistency in cohort definition, covariate construction, analysis design, and results reporting. Essentially, using the OMOP CDM and OHDSI tools allows for observational research to be applied by many following a systematic process that codifies agreed scientific best practice, rather than research being the task of one individual who devises their own an ad hoc one-off bespoke approach.

Speed: As @krfeeney showed in her Lightning Talk at the OHDSI Symposium, many organizations benchmark the time it takes to go from original question to final answer in many months or even years. But as @jennareps and the team fully demonstrated at the same Symposium, if you use the OMOP CDM and the OHDSI tools, it is possible to take an important clinical question in the morning and produce a meaningful insight (externally validated around the world) by the same afternoon. Often, more time may be needed to make sure you’re asking the right question or ensuring your exposure/outcome definitions are what you want. But if you adopt the OHDSI framework, then there is a clear path to produce a fully-specified design for an analysis, whether it be clinical characterization, population-level effect estimation, or patient-level prediction. And with a fully-specific design in hand, the technical work of implementing and executing the analysis against your data (and extending to under across a distributed network of disparate observational databases) is well-defined and has been highly optimized for efficiency. If your feedback loop for asking and receiving quality evidence was measure in minutes instead of months, just think about how that may transform the way you integrate data into your decision-making processes.

Cost: It is indeed true that adopting the OMOP CDM and OHDSI tools requires a significant upfront investment. But the reality is that, if your organization seeks to generate evidence from observational data on a regular basis, then ANY strategy to do that well will require a significant investment, in terms of financial resources for data licensing and technical infrastructure as well as personnel to design and implement studies. Adopting OHDSI isn’t necessarily an incremental resource above and beyond your existing observational research capability, but rather its a strategy for how to utilize the existing resources that you’ve got to support observational research. There is a resource tradeoff to consider when deciding your observational research infrastructure: given that any effort will require resources for data management and resources for data analysis, then different strategies will vary which resources are required more. A ‘common protocol’ approach requires much fewer resources for data management, but requires substantially greater resources for data analysis. The ‘common data model’ approach tries to balance resources between management and analysis, and relative to other CDM-based communities, OHDSI tends to lean toward imposing greater effort in data standardization to enable greater ease-of-use and greater scalability at analysis time. The other cost consideration is ‘buy vs. build’, and here, I think the argument for OHDSI is quite compelling: I don’t believe any one organization has the required technical and scientific capability to build themselves an analytics infrastructure that covers the depth and breadth of features that are available across the OHDSI ecosystem. And I know that none of the vendors providing commercial offerings can compete with the cost of OHDSI’s open-source solution, because you can’t beat ‘free’.

Throughput: One specific focus in the OHDSI ecosystem has been to support ‘evidence at scale’. One dimension of scale is facilitating analysis across networks of databases, which is directly enabled by the use of the OMOP common data model. Analysis code written for one database can be re-used for another data source when both databases adhere to the same CDM standard. The application of one analysis procedure against multiple databases is useful if you are working within your own local institution which hold multiple databases, or whether you want to collaborate across multiple institutions as part of an OHDSI network study. The other dimension of scale is designing analysis code that allows for multiple instances of the same type of question to be executed concurrently. If you want to estimate the incidence rate of one outcome within one target population, you can do that. But following the same framework, you can simultaneously estimate incidence rates for many outcomes across multiple target populations. If you want to perform a population-level effect estimation study using a propensity score-adjusted new user cohort design, you can do that for one target-comparator-outcome, but you can also use the same tool to allow for empirical calibration for the target-comparator pair using a large sample of negative control outcomes…our you can study multiple outcomes of interest…or make multiple comparisons using different target cohorts and different comparator cohorts. A key principle within LEGEND is the consistent application of a best practice approach to all clinically relevant questions of interest, and the OHDSI tools enable this behavior in a computationally-efficient manner. A similar paradigm was followed in the development of the Patient-Level Prediction package, allowing for multiple machine learning algorithms to be applied to multiple target populations and multiple outcomes, and allowing for learned models to be externally validated across multiple data sources. ‘Evidence at scale’, when applied across the set of questions that matter to your organization, represents a tremendous opportunity to drastically increase throughput from the sequential one-at-a-time approach that are generally followed without such tools.

Impact: Historically, observational research has been received with great skepticism, and with some good reason: the traditional paradigm of one researcher applying one analysis method to one database to address one question about one effect of one exposure on one outcome to (if the results seemed noteworthy enough) publish one finding has resulted in a corpus of observational literature that demonstrates substantial publication bias and p-hacking, and considerable susceptibility to systematic error due to confounding, measurement error and selection bias. Adopting the OMOP common data model and OHDSI toolkit offers the opportunity to chart a different course: to apply empirically-demonstrated scientific best practices to generate evidence and simultaneous produce an empirical evaluation of the quality of the evidence you’ve generated to prove to yourself and to others the degree of reliability and utility for meaningfully informing decision-making. Adoption also enables being part of a collective effort across an international network of researchers and data partners to improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.

Perhaps the more difficult question to answer is: why WOULDN’T you join the journey in adopting the OMOP CDM and OHDSI analytics?


(Keesvanbochove) #3

Impressive pitch for adopting OMOP CDM, @ericaVoss and @Patrick_Ryan. Very well stated. I would add that beyond the value for organizations to adopt OMOP CDM, there is also significant societal value in the work that is done in the OHDSI community. This goes both for the fundamental research done in OHDSI on large-scale evidence generation and the appropriate statistical approaches for this, as well as for the standardization of representing medical history from many different medical data sources.

I know this is a strong claim, but I’ve observed that both for Precision Medicine as well as for Value Based Health Care, topics is which significant societal investments are being made in OECD countries (see e.g. the WEF reports on both topics), a key challenge that is consistently cited is evidence generation and the availability of standardized clinical data for that purpose. Even though some of the current data sources we have on health outcomes such as billing are far from ideal for this purpose, it is the only data we currently have at scale and longitudinally for most countries in the world. And OHDSI is by far the most comprehensive approach I’ve come across to address both the statistics as well as the data integration challenges attached with large-scale evidence generation from observational data. For Value Based Healthcare in particular, using historical data as context for defining new patient-centric outcomes is key (and I look forward to working on that further together with ICHOM in the upcoming IMI EHDEN project!).

Lastly, the societal relevance I care most about personally, is the contribution of OHDSI to open science. I think there have been few demonstrations of a global open science community in action as powerful as the evidence generation carried out during the last OHDSI Symposium in Washington. @jennareps, reporters should be all over you to write about that!

To cite from the recent National Academies of Sciences report “Open Science By Design: Realizing a Vision for 21st Century Research” (https://doi.org/10.17226/25116), the benefits of open science are:

  • Rigor and reliability
  • Ability to address new questions
  • Faster and more inclusive dissemination of knowledge
  • Broader participation in research
  • Effective use of resources
  • Improved performance of research tasks
  • Open publication for public benefit

Think about it. Through the OHDSI open source tools and algorithms, the many community meetings and TCs, the wiki with the network studies and research protocols, interactive publications such as howoften.org, (and http://data.ohdsi.org/LegendMedCentral/ :slight_smile: etc.), and enabled by the OMOP CDM, the OHDSI community realizes literally ALL of those benefits!


(Ewout Steyerberg) #4

Great points and very well stated. As a classically trained methodologist I agree with many points and remain struggling with some issues.

What type of questions can be addressed well, and which can be addressed less well with observational data, even if handled as carefully and delicately as in the OHDSI context? Specifically, what questions related to Value-Based Health Care (VBHC, as rightly brought up by @keesvanbochove) can be addressed with OHDSI?

I like point 6 by ericaVoss “You truly know your data if you convert it to the CDM.” I guess this may work positively, but also have a downside; if some data have not been recorded, it is not there? For example, if VBHC focuses on patient experience recorded with a specific questionnaire, it may simply not be available? That would hamper a historical comparison is proposed by @keesvanbochove: “For Value Based Healthcare in particular, using historical data as context for defining new patient-centric outcomes is key.”

I also like the points made by @Patrick_Ryan. He restates point 6 as: “Quality: the process of data standardization to the OMOP CDM forces greater understanding of the source data, and explicit decision-making about how to reconcile poor quality data.” Poor quality data then remains an underlying problem I fear. And yes: “A major challenge facing observational research is that the journey from source data to evidence is an arduous one”. Indeed there is hope: “if you adopt the OHDSI framework, then there is a clear path to produce a fully-specified design for an analysis, whether it be clinical characterization, population-level effect estimation, or patient-level prediction.”
Agree, although these 3 types of questions might be ordered differently, since there is increasing difficulty in claiming to have evidence for

  1. descriptive questions (relatively easy)
  2. prediction (a bit harder)
  3. causal effects (very hard)

@Patrick_Ryan makes another sharp observation: “Historically, observational research has been received with great skepticism, and with some good reason: the traditional paradigm of one researcher applying one analysis method to one database to address one question about one effect of one exposure on one outcome to (if the results seemed noteworthy enough) publish one finding has resulted in a corpus of observational literature that demonstrates substantial publication bias and p-hacking, and considerable susceptibility to systematic error due to confounding, measurement error and selection bias.” This may all be true, but leaves open what fundamental limitations remain even with the open science culture and other positive aspects of the OHDSI initiative. We may all agree on a laudable aim such as “improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care.”

The key question to me is: what counts as evidence? This obviously depends on the type of research question, where descriptive questions can more readily be answered than causal questions.

The final question by @Patrick_Ryan is again food for thought: “why WOULDN’T you join the journey in adopting the OMOP CDM and OHDSI analytics?” The answer may be that you have a research agenda that cannot be reliable addressed in this framework; perhaps simply because the data are not there, or are of poor quality. Or the relevant comparison cannot be made. So, my tentative conclusion is that the fundamental limitations are not in the analytics, where classical statistics as well as machine learning may have a role, but in the underlying data. The available data relate to the study design, which may often not be explicit in routinely collected data. This is in contrast to data collected specifically in a research context, with the explicit goal to generate specific evidence, e.g. on treatment effectiveness.

These limitations in available data related to an explicit study design will also limit us in addressing some of the many potential VBHC questions. I look forward to seeing more and more examples of research projects in the OHDSI context, such that we can learn where the limits are, and where the greatest benefits. Keep up the good work!


(Evan Sholle) #5

Others here have made the case more eloquently than I can. That said, I have a slide deck I routinely use to explain the CDM to students and others, and the two slides that usually make the best case (or at least, the ones that tend to make people nod along in an “I get it” manner - whether or not they actually do) are the ones where I show how to ask and answer a simple question (how many people were recorded in 2016 as having diabetes) - first against the source data, then against the CDM.

In the source data it’s a massive SQL query joining ambulatory encounter diagnoses, problem lists, billing diagnoses, jumping to another patient identifier the ambulatory EHR shares with the inpatient EHR and unioning to a set of patients with inpatient billing diagnoses (and losing folks in the process), etc. etc.

Then I change slides to show how the same query against the CDM takes 6 lines of code - or, if you don’t want to write any code, how you could get something similar out of Achilles, with some nifty visualizations as well.

In my experience this usually either makes the case effectively or leaves people dazed and shocked to the extent where they aren’t willing to admit that they don’t understand the value add. My hope is that it’s the former.


(Suchi Saria) #6

Evan, would you be willing to share your slides? I’m curious to see how you explain this. Thanks!


t