OHDSI Home | Forums | Wiki | Github

[covid19] ETL help for converting your data into OMOP

(Vojtech Huser) #1

Just some notes when I tried to get the concepts for results of COVID19 RNA test.

The definition on official server - you must first login (see uper right corner) and after that, you still only see the text view of definition.

See here:

To see it in the usual way, I used the JSON export from atlas server and imported it into atlas-demo.
Then you see it with the codes

I was after how to see the actual coded value: (there is detected via SNOMED and detected via LOINC).
This can be seen here in the JSON from here

and via picture

To pre-coordinatate or not to pre-coordinate in MEASUREMENT table
(Vojtech Huser) #2

If you are a site with COVID data, consider representing properly the result of the test. For covid-tested-characterization study and for prediction study, detecting properly patients tested with negative results is important. In measurement table, you may have this test represented


The guidance is to put into value_as_concept_id the following SNOMED CT codes (for positive and negative)

in OHDSI concept id world that means
http://athena.ohdsi.org/search-terms/terms/9191 for positive
http://athena.ohdsi.org/search-terms/terms/9189 for negative

If you think this guidance is wrong, please post an opposing view and justification.

Note that the current definition (PhenoID 30) is working with MANY positive codes:
see a list here (in italic and in picture)
PhenoID 30 on final server it is https://atlas.ohdsi.org/#/cohortdefinition/139
_to see all values better, use this corresponding definition on covid19 dev server _

(Anna Ostropolets) #3

Vojtech, this is a good point. LOINC says how it should be, but it cannot force users to use these values in their ETL. So what we did was creating a comprehensive list of positive results. It will catch whatever was used in a ETL and will not hurt anybody.

(Vojtech Huser) #4

Perhaps we can use a scenario (or patient story) and a preferred way to represent it in OMOP. Let me offer one scenario

John Doe had dry cough starting on March 1. (at home)
He had fever 37.9C on March 2nd. (at home)
He had outpatient visit on March 4th.
On March 4th, he got tested using RNA test (nasal swab) (date of specimen collection)
On March 5th, the result came back and it was positive.
To facilitate return the work, he was tested again (same approach, RNA test, nasal swab) on March 16th and the result was negative.

His EHR record had covid19 added to problem list as a result of positive test on March 5th.

Some notes on OMOP (open for debate)


March 4, condition_concept_id http://athena.ohdsi.org/search-terms/terms/37311061


March 4
Measurement_concept_id : https://loinc.org/94500-6/

Value_as_concept_id = http://athena.ohdsi.org/search-terms/terms/9191 or for March 16 http://athena.ohdsi.org/search-terms/terms/9189


Similar approach for Lauren with endometriosis

(Vojtech Huser) #5

Thinking about this further - in Condition_occurrence table, it would be good to distinguish covid19 asymptomatic patient from a severe covid19. One trial used a covid severity scale. I also found this mdcalc site (https://www.mdcalc.com/brescia-covid-respiratory-severity-scale-bcrss-algorithm#evidence)


Per snomed page https://confluence.ihtsdotools.org/display/snomed/SNOMED%2BCT%2BCoronavirus%2BContent

There are pre-release codes (not found in Athena) for covid-pneumonia and covid-ARDS. Maybe we should embrace those. (will be released in July 2020) (looks like SNOMED now embraced pre-release notion (like LOINC) (Yay!)

However, still no luck with asymptomatic covid19. Patient with positive PCR test but super mild or asymptomatic disease.

One option is to “record an observation about covid” and use OBSERVATION table https://github.com/OHDSI/CommonDataModel/wiki/OBSERVATION
that offers

And term like

How sites doing NLP sites (in OMOP model) are dealing with a note indicating fever 39C three days prior outpatient visit start date. Do you populate measurement table (with some special value in measurement_type_concept_id indicating “inferred from NLP”. and using the measurement_date of [visitDate-3days] ?

(Alexander Davydov) #6

Currently, we have a rich hierarchy of COVID forms (under this concept). It includes pneumonia, ARDS and asymptomatic. I’d not add SNOMED pre-release concepts since they may change the identifiers. Once SNOMED released them, we will remap temporary OMOP Extension concepts to SNOMED. Sounds good?

SNOMED UK added the following. Should than be enough?

concept_code concept_name symantic_tag
1300671000000104 COVID-19 severity scale (assessment scale)
1300631000000101 COVID-19 severity score (observable entity)
1300681000000102 Assessment using COVID-19 severity scale (procedure)
1300591000000101 Low risk category for developing complication from COVID-19 infection (finding)
1300571000000100 Moderate risk category for developing complication from COVID-19 infection (finding)
1300561000000107 High risk category for developing complication from COVID-19 infection (finding)

(Vojtech Huser) #7

There is a good question about PROCEDURE vs DEVICE_EXPOSURE for representing many care steps in the care for covid19. Korea notes are on the github. If there are US sites that adopted certain approach, posting you design choices here would help other sites.
This initiative will use OMOP (among others): https://covid.cd2h.org/N3C

(Kristin Kostka, MPH) #8

@Vojtech_Huser, yes! We know this initiative well. @hripcsa @cukarthik @Christian_Reich @clairblacketer @Andrew myself and others are all part of WGs on this initiative to make sure we take advantage of the community’s guidance and best practices.

(Lisa Schilling) #9

@krfeeney @cukarthik @andrew @clairblacketer @Christian_Reich Have you listed out common data elements for N3C-- it doesn’t seem OHDSI’s typical working style, but it seems that it is one of the requirments for this project. THanks! Lisa

(Vojtech Huser) #10

In case we want to analyze use of antibody tests - I looked at the temp LOINC codes for that


Also - there is now SARS-COV2 Viral Load (just like HIV viral load).
See SARS coronavirus 2 RNA [Log #/volume] (viral load) in Unspecified specimen

(Alexander Davydov) #11

We keep adjusting to the way COVID-related facts are described in the data. Addressing such things as suspected vs real Conditions, Emergency codes, Timing context, Lab tests hierarchy, pre-coordinations and many others, we’re happy to announce COVID-19 v2.0 Vocabulary Release that is already in Athena.

If you are involved in ETL or data analysis, please make sure to take a look at the changes. After the first version we updated some rules. It’s highly recommended to re-run ETL with the recent version since the interim vocabulary versions are not supported by current or initial instructions.

Instructions are available here

(Alexander Davydov) #12

For COVID antibody tests please use 37310258 Measurement of 2019 novel coronavirus antibody with the descendants.

(Vojtech Huser) #13

CDC recommending a nice (and complete [with result codes!] representation of covid19 tests

see excel file at

tab LOINC mapping has:

(Vojtech Huser) #14

Good SNOMED guide for this is here


one subpart

another https://confluence.ihtsdotools.org/display/DOCCV19/2.2+Patient+Demographics

(Chris Knoll) #15

Sure we can, just provide one concept to represent ‘positive’ and don’t give them any other choices.

I have to disagree with this: it leaves us to do what Vojtech is showing: we have to put in every possible code that represents ‘present’. Why not just have ‘Positive’ and ‘Detected’ map over to ‘Present’? Or any other combo where there’s one ‘standard’ way to represent something is present.

Don’t think we can’t force people to do things, that’s what ‘use the standard’ means. But if we give them multiple standards to choose from, we miss the whole point of standardization.

(Vojtech Huser) #16

From this paper https://jamanetwork.com/journals/jamapediatrics/fullarticle/10.1001/jamapediatrics.2020.5052

a good guidance on OMOP-PEDSnet representation for COVID is here

The chief complaint concept is interesting.
Identification of healthcare workers is also very interesting https://github.com/PEDSnet/Data_Models_Public/blob/master/PEDSnet/docs/COVID-19%20Cohort.md#healthcare-workers

PEDSnet view of type concepts seems to divert from OHDSI.