You are in the correct place! OHDSI = standardization. The EHR WG discusses the ETL process, the ambiguous and impossible data we uncover during the ETL process, and ideas on how to handle the unique situations we encounter. Quite a few of us have Epic data, so we know the trials and tribulations you may/most likely will encounter. OHDSI also has many other workgroups for all things OHDSI and we will point you in the correct direction. Or you can always post to the forum, people are very friendly here
Come to our next meeting. It’s Friday, July 12th @ 7am PST. Early for those on the west coast, but this time received the most votes from interested participants. Send me the email addresses of those who are interested and I will add to the meeting invite.
Free labor! Everyone who will work with the ETL process or the OMOP CDM should start by watching the tutorial videos on the OMOP CDM & Vocabulary and CDM ETL. Understanding the CDM & Vocabularies very necessary. This is not the typical ETL where one source field goes directly to the target field. The ETL is VERY complex and time consuming for EHR data. And the converted data is amazing - standardized semantical representation, standardized format, standardized software tools, ability to leverage the vocabularies for queries, gigantic community of data partners who will run the study on their data, etc.
Please come to the Symposium September 15, 2019 where you will meet and collaborate with others. It’s an amazing and very diverse group of people all working towards making lives better by utilizing data, technology and research. Here’s the link for the tutorials held on Sept. 14th & 16th. In addition to the tutorials on the CDM & Vocabularies and the ETL process, there are also tutorials for Cohort Definition/Phenotyping, Data Quality, Patient Level Prediction, and Population Level Estimation. All the courses are taught by very knowledgeable faculty who are experts in their respective fields.
Is there a sub-group of this group for Cerner to OMOP mapping sharing? regardless, please at least add me to this group’s email distribution list - firstname.lastname@example.org - thx !
Great question @ella! @Daniella_Meeker had created a Cerner to OMOP group back in 2018 (https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:cerner_to_omop) which pre-dates this EHR group. Not sure how active this group is anymore but I know @mvanzandt’s team (@QI_omop @MichaelWichers) also have done Cerner conversions. Will follow-up with you offline to connect more dots across teams.
Thank you for detailed reply, Melanie. My interns and I are getting up to speed on OHDSI, and are trying to figure out how to manage our data.
One issue concerns terms from our EHR, such as lab test names or drug names. One challenge is that the data comes to us two steps removed from its source. That is, our institution has a research data warehouse, which itself is derived from Clarity, Epic’s data warehouse, whose data is derived from the original Epic records. Our source only gets the string of the data source, and not the coded identifier.
I know that OHDSI is very oriented to controlled terminology, and as an informatician I understand the value of such terminology. But I don’t see us easily able to get to there from the data we have, and I instead would like for us to be able to capture the strings in the proper fields. Has anyone else wanted to do this and develop solutions for it?
Since our informatics research work is focused on text processing, we would also like to be able to preserve our complete notes.
Any advice on these issues would be greatly appreciated!
Hi, @hersh. The OHDSI CDM has a single NOTE table to store all notes, with fields for specifying meta-information about the notes.
Then there is a NOTE_NLP table where we put parsed notes. With the idea that in the future, if you trust the parse enough, you can put the output into the corresponding domain table (condition, measurement, etc.).
Lab data go into the MEASUREMENT table, and the source code (which is a string in your case) goes into the measurement_source_value column of that table. Hopefully you can map source names into codes. The OMOP code for the LOINC term that they map to goes into the measurement_concept_id column. That’s the column that OHDSI studies use to identify which thing is being measured. If you don’t map at all, then you would put 0 in that column, which means that the OHDSI tools won’t easily pull those data (e.g., potassium over 3.4) and your users will need to search on things like K and potassium etc. and then figure out which ones are blood or urine or CSF.
We are going to be mapping from Epic, but I suspect each institution’s lab codes are local. We had already mapped our ancillary labs to LOINC, so we will get a direct feed from the ancillary rather than getting it from Epic and Clarity. Actually, I think we are using Cerner lab, but there, too, I think each institution has come up with its own coding scheme.
You may not realize it, but this experience is pretty universal: Folks are faced with the duality of the original data (Clarity in your case) and a CDW (with often unclear business rules), and a myriad of different sources of data, and a long list of non-codified content. Welcome to the club. But the community is here to help.
Don’t become weak! But you don’t have to:
- The strings go into source_value of the record
- Their mapping to standard concepts is a manual job at the moment. If somebody were to come up with an NLP solution and provide it to the OHDSI community - I’m all for it.
- We are working on an online mapping tool that remembers what everybody else did. It’s not ready yet.
- You can ask the vocabulary team to map your strings. But they have a long list of things. If you have a little money you could buy that service.
As for longer text - what @hripcsa said.
The EHR folks all get varying amounts of uncoded data. I’d lobby your institution to provide the code along with the string. Lobby hard because everything is much easier with some percentage of standard codes!
But if you aren’t able to obtain the codes along with the strings, there are a few different options:
- OHDSI provides the Usagi tool to help with mapping. This is great if you have less than a few thousand terms, otherwise, it is very time consuming and honestly, quite tedious. I highly suggest someone with medical terminology knowledge create the mappings because it’s not always straight forward. I use Usagi for mapping Colorado’s text strings. I also use Athena when I have 200 or less terms. Athena is not a mapping tool, but its easy UI allows me to explore the term connections and hierarchy for a given string. Parents & children are important when mapping.
- Only put the text string in the *_source_value field of the appropriate domain. You won’t be able to use any of the standardized tools or participate in network studies. And I don’t see any benefit to use the OMOP model if you don’t convert to standard concept_ids. But it is an option.
George is correct.
Again, George is correct