OHDSI Home | Forums | Wiki | Github

EHR data to OMOP CDM Work Group


(George Hripcsak) #62

(Welcome, @hersh.)

(Melanie Philofsky) #63

Welcome, @hersh!

You are in the correct place! OHDSI = standardization. The EHR WG discusses the ETL process, the ambiguous and impossible data we uncover during the ETL process, and ideas on how to handle the unique situations we encounter. Quite a few of us have Epic data, so we know the trials and tribulations you may/most likely will encounter. OHDSI also has many other workgroups for all things OHDSI and we will point you in the correct direction. Or you can always post to the forum, people are very friendly here :slight_smile:

Come to our next meeting. It’s Friday, July 12th @ 7am PST. Early for those on the west coast, but this time received the most votes from interested participants. Send me the email addresses of those who are interested and I will add to the meeting invite.

Free labor! Everyone who will work with the ETL process or the OMOP CDM should start by watching the tutorial videos on the OMOP CDM & Vocabulary and CDM ETL. Understanding the CDM & Vocabularies very necessary. This is not the typical ETL where one source field goes directly to the target field. The ETL is VERY complex and time consuming for EHR data. And the converted data is amazing :slight_smile: - standardized semantical representation, standardized format, standardized software tools, ability to leverage the vocabularies for queries, gigantic community of data partners who will run the study on their data, etc.

Please come to the Symposium September 15, 2019 where you will meet and collaborate with others. It’s an amazing and very diverse group of people all working towards making lives better by utilizing data, technology and research. Here’s the link for the tutorials held on Sept. 14th & 16th. In addition to the tutorials on the CDM & Vocabularies and the ETL process, there are also tutorials for Cohort Definition/Phenotyping, Data Quality, Patient Level Prediction, and Population Level Estimation. All the courses are taught by very knowledgeable faculty who are experts in their respective fields.

(Ella) #64

Is there a sub-group of this group for Cerner to OMOP mapping sharing? regardless, please at least add me to this group’s email distribution list - ella.young@phsa.ca - thx !

(Kristin Kostka, MPH) #65

Great question @ella! @Daniella_Meeker had created a Cerner to OMOP group back in 2018 (https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:cerner_to_omop) which pre-dates this EHR group. Not sure how active this group is anymore but I know @mvanzandt’s team (@QI_omop @MichaelWichers) also have done Cerner conversions. Will follow-up with you offline to connect more dots across teams.

(Bill Hersh) #66

Thank you for detailed reply, Melanie. My interns and I are getting up to speed on OHDSI, and are trying to figure out how to manage our data.

One issue concerns terms from our EHR, such as lab test names or drug names. One challenge is that the data comes to us two steps removed from its source. That is, our institution has a research data warehouse, which itself is derived from Clarity, Epic’s data warehouse, whose data is derived from the original Epic records. Our source only gets the string of the data source, and not the coded identifier.

I know that OHDSI is very oriented to controlled terminology, and as an informatician I understand the value of such terminology. But I don’t see us easily able to get to there from the data we have, and I instead would like for us to be able to capture the strings in the proper fields. Has anyone else wanted to do this and develop solutions for it?

Since our informatics research work is focused on text processing, we would also like to be able to preserve our complete notes.

Any advice on these issues would be greatly appreciated!

Bill Hersh

(George Hripcsak) #67

Hi, @hersh. The OHDSI CDM has a single NOTE table to store all notes, with fields for specifying meta-information about the notes.

Then there is a NOTE_NLP table where we put parsed notes. With the idea that in the future, if you trust the parse enough, you can put the output into the corresponding domain table (condition, measurement, etc.).

Lab data go into the MEASUREMENT table, and the source code (which is a string in your case) goes into the measurement_source_value column of that table. Hopefully you can map source names into codes. The OMOP code for the LOINC term that they map to goes into the measurement_concept_id column. That’s the column that OHDSI studies use to identify which thing is being measured. If you don’t map at all, then you would put 0 in that column, which means that the OHDSI tools won’t easily pull those data (e.g., potassium over 3.4) and your users will need to search on things like K and potassium etc. and then figure out which ones are blood or urine or CSF.

We are going to be mapping from Epic, but I suspect each institution’s lab codes are local. We had already mapped our ancillary labs to LOINC, so we will get a direct feed from the ancillary rather than getting it from Epic and Clarity. Actually, I think we are using Cerner lab, but there, too, I think each institution has come up with its own coding scheme.

(Christian Reich) #68


You may not realize it, but this experience is pretty universal: Folks are faced with the duality of the original data (Clarity in your case) and a CDW (with often unclear business rules), and a myriad of different sources of data, and a long list of non-codified content. Welcome to the club. But the community is here to help.

Don’t become weak! :slight_smile: But you don’t have to:

  • The strings go into source_value of the record
  • Their mapping to standard concepts is a manual job at the moment. If somebody were to come up with an NLP solution and provide it to the OHDSI community - I’m all for it.
  • We are working on an online mapping tool that remembers what everybody else did. It’s not ready yet.
  • You can ask the vocabulary team to map your strings. But they have a long list of things. If you have a little money you could buy that service.

As for longer text - what @hripcsa said.

(Melanie Philofsky) #69

The EHR folks all get varying amounts of uncoded data. I’d lobby your institution to provide the code along with the string. Lobby hard because everything is much easier with some percentage of standard codes! :slight_smile:

But if you aren’t able to obtain the codes along with the strings, there are a few different options:

  1. OHDSI provides the Usagi tool to help with mapping. This is great if you have less than a few thousand terms, otherwise, it is very time consuming and honestly, quite tedious. I highly suggest someone with medical terminology knowledge create the mappings because it’s not always straight forward. I use Usagi for mapping Colorado’s text strings. I also use Athena when I have 200 or less terms. Athena is not a mapping tool, but its easy UI allows me to explore the term connections and hierarchy for a given string. Parents & children are important when mapping.
  2. Only put the text string in the *_source_value field of the appropriate domain. You won’t be able to use any of the standardized tools or participate in network studies. And I don’t see any benefit to use the OMOP model if you don’t convert to standard concept_ids. But it is an option.

Per Christian:

George is correct.

Again, George is correct :slight_smile:

(Kristin Kostka, MPH) #70

@MauraBeaton – is it possible this wonderful EHR workgroup could be added to the OHDSI wiki page (https://www.ohdsi.org/web/wiki/doku.php?id=projects:overview)?

(Melanie Philofsky) #71

Thanks, @krfeeney! I didn’t know the WG wiki existed :slight_smile:

(Melanie Philofsky) #72

Hello Friends!

I am cancelling the EHR WG meeting on July 26th. Our next meeting will be Friday, August 9th.


(Melanie Philofsky) #73

Hello all!

Our next WG meeting is this Friday, August 23rd at 10am EST. We will be discussing the trials and tribulations of mapping Epic’s encounter data to the CDM Visit table.


  • The definition of an encounter is different than the definition of a Visit.

  • One Visit may contain multiple encounters.

Please come with real world data examples, so we can dig in and discuss this in detail!


Hi @MPhilofsky. Are the minutes from today’s meeting being posted? Specifically the powerpoint that had the standards for visits? Thanks.

(Tarun Shah) #75

Please add me, Tarun Shah - tmshah@ismnet.com.

(Melanie Philofsky) #76

Meeting minutes are located here.

@Robert_Winter presented the powerpoint.

(Melanie Philofsky) #77

You’ve been added!

(Tarun Shah) #78

I’m trying to implement EHR data to OMOP. Can anyone help me understand the what condition_status_concept_id refers to? also how can we find these concept values in http://athena.ohdsi.org.


(Anna Karenina) #79

Hello @TMS,

Probably convention note #10 from Condition Occurrence description in wiki might be helpful.

(Tarun Shah) #80

@rookie_crewkie Thank you, that’s helpful !!:slightly_smiling_face:

(Tarun Shah) #81

Am I correct, when following http://athena.ohdsi.org

Lab -> Domain=Measurement, Class=Lab Test
Vitals -> Domain=Measurement, Class=Observable Entity
Radiology -> Domain=Measurement, Class=Clinical Observations

Also what else we can map in measurement table?