OHDSI Home | Forums | Wiki | Github

Mapping Microbiology Susceptibility into OMOP CDM4 Observations

I have an ETL problem, and would like to know if anyone has addressed this issue.

We are trying to map microbiology susceptibility. I have an isolate (tissue sample), an antibiotic and a result. My question is how to fit this into OMOP 4 Observation table.

For the observation type, we typically put in a generic term such as ‘Observation from EMR’. However it might be possible to find a ‘non standard’ concept that would indicate this is a ‘Microbiology susceptibility’ observation.

The problem is that there are still three items to map. The tissue sample, the antibiotic and the result, but only have the observation concept id and value as concept id fields. It seems that the result of the test should go into value as concept_id ( positive/negative … ). So where to put the tissue sample and antibiotic?

If anyone that has addressed this problem I would appreciate your input.


I’ll wait for @Christian_Reich and others to chime in, but I suspect in v4 it’s going to to be tough to break these out as you’ve described, for exactly the reason you’ve described it. You could store the text report for the C&S in v4 but to have the tissue sample, the antibiotic references, and the value (e.g., negative, <=32 etc), I think you’ll need the relationships in v5.

Christian, other strategies you can suggest?


It is an interesting question. We have a similar issue with cancer histology where there are multiple pieces that need to be linked together. I will be curious as to what others might say. My initial reaction is to create a visit and then add procedure for the susceptibility testing on the same date. Then you can put the 3 observations in the observation table and link them back to the unique visit occurrence id, and the procedure.


Yes, this is the reason we created the fact_relationship table. Of course, that only exists in V5. So, upgrade!!!


I would go with a modification of Mark’s approach: 2 observations at the same day:

  1. Antibiogram in the isolate mentioned
  2. Antibiotic resistance yes/no

Let me know if you need help with finding hte concepts.


1 Like

Here is how I am going to handle mapping microbiology susceptibility in CDMv4.

As discussed there is the isolate (tissue sample) and one or more antibiotics with a result for each antibiotic. For every set of tests, I am creating a single procedure record. The procedure type will be ‘Culture Sensitivity’ concept id 4170475. The procedure concept id will refer to the sample being tested, for example ‘Staphylococcus aureus’ (4149419). Then in the observation table create a record for each antibiotic where the Observation Type is the sample being tested, the Observation is the antibiotic and the result is either Resistant or Sensitive.

Then to find the observations for sensitivity of any isolate to antibiotics for the patient in visit 123
FROM procedure_occurrence proc
JOIN observation obs
ON obs.observation_type_concept_id = proc.procedure_concept_id
AND obs.visit_occurrence_id = proc.visit_occurrence_id
WHERE procedure_type_concept_id = 4170475 – Culture Sensitivity’
AND procedure.visit_occurrence_id = 123

1 Like

Someone asked me how to store microbiology susceptibility in Korean for OMOP CDM 5 or CDM 6 (https://github.com/ohdsi-korea/ThemisKorea/issues/1)
Do we have any convention for this?

I’m not sure about the convention, but here’s how we did it in some previous projects:

Let’s take an example:
Source: stool specimen, Rahnella aquatilis, grow = “+++”, Gentamicin Susceptible.
1st row: MEASUREMENT_concept_id = 3025941 (Bacteria identified in Stool by Culture)
2nd row: MEASUREMENT_concept_id = 3035007 (Gentamicin [Susceptibility]), value_as_concept_id = 4038110 Susceptible
Observation_concept_id = 4001253 (Rahnella aquatilis), value_as_concept_id = 4125547 (+++)
SPECIMEN_concept_id = 4002879 Stool specimen

These are connected via FACT_Relationship: OBSERVATION_id is connected to other entries mentioned.

Right, we link these rows in fact_relartionship table. But I am not sure what record we chose as the main one. It was either OBSERVATION_ID as you mentioned or MEASUREMENT_ID from the 1st row.

By the way, now I have another example of Microbiology data mapping from a source of bit different structure.
specimen_type (Urine Specimen),
body_site (Urinary system structure),
lab_procedure (Urine culture),
organism (Escherichia coli),
antibiotic_test (Cefazolin [Susceptibility]),
sensitivity (4 ug/mL)
susceptibility (Susceptible)
1st row: MEASUREMENT measurement_concept_id (lab_procedure - Urine culture)
value_as_concept_id (Organism - Escherichia coli )
2nd row: MEASUREMENT measurement_concept_id (antibiotic_test - Cefazolin [Susceptibility] )
value_as_number (sensitivity - 4)
unit_concept_id (ug/mL)
value_as_concept_id (susceptibility - Susceptible)
3rd row: SPECIMEN specimen_concept_id (specimen_type - Urine Specimen)
anatomic_site_concept_id (body_site - Urinary system structure)

All these rows are linked in fact_relationship table. But here we realized that there is no suitable concepts for this.
1-3 - 4023266 (Has Specimen) - can probably be used
2-3 - 4023266 (Has Specimen) - can probably be used
1-2 - ???
2-1 - ???
3-1 - ???
3-2 - ???

1 Like

@nzvyagina I really like your approach.

brakes the rule that MEASUREMENT.value_as_concept_id belongs to ‘Meas Value’ domain.
Personally, I don’t like this rule. @Christian_Reich, why do we even have this restriction?

You’re talking about FACT_RELATIONSHIP.relationship_concept_id, right?
In this case we can use only Standard concepts with vocabulary_id = ‘Relationship’.

We can’t just pick the standard concept with corresponding meaning (like 4023266 (Has Specimen) which is a SNOMED Observation).
@nzvyagina, please make the list of concept names for new relationships. So, we can discuss and add them to the vocabulary then.

Thank you for your reply and correction. Then it would be great to have the following concepts for relationship_concept_id:
1-3 - Lab test performed on specimen
2-3 - Lab test performed on specimen
1-2 - Related lab test
2-1 - Related lab test
3-1 - Specimen used for lab test
3-2 - Specimen used for lab test

The names can be rephrased if needed.

Let’s hear from the other people before adding these concepts.

Because we want to make it clean what values can be in what domain.

Don’t think so. Where are the parent of concepts?

Yes, please. And then give it to THEMIS!

Do we have them in SNOMED? Can you look?

A colleague and I just created a proposal to handle microbiology results. We are hoping this proposal will also handle other types of clinical results that are not fully represented in the measurement table without the dreaded fact-relationship table :slight_smile:

The proposal links out to a google document to get the communities input before the CDM working group call next week.

Would love to get input from @bailey @mgkahn @Andrew @Christian_Reich


Do you mean parent of 4023266 (Has Specimen)?
it’s a “Linkage concept”, I suppose it means the linkage between SNOMED concepts.
So, if you say so, we can reuse “Linkage concepts” from SNOMED, right?

or don’t use fact_relationship at all if we agree with

@cukarthik I like your proposal! :slight_smile:

1 Like

I have to give credit to @philipzach who helped and really pushed to get this proposal out to the community. Thanks @philipzach!

Sorry but I am not tracking the proposal. Maybe a graphical ERD would help me figure out how the links work:

  • MEASUREMENT_ATTRIBUTE.measurement_id: "A foreign key that refers to a survey concept identifier in the standardized vocabularies. Three issues: (1) If it really points to a concept, it needs to be have “concept_id” somewhere in the field name and (2) why limit to a survey concept identifier, (3) based on your example, it is a foreign key to MEASUREMENT.measurement_id, not to a concept_id

  • MEASUREMENT.measurement_attribute_id points to some record in the new MEASUREMENT_ATTRIBUTE table. Your example isn’t a FK to your example row in MEASUREMENT_ATTRIBUTE – the ID’s do not line up. Should they for your example?

  • MEASUREMENT_ATTRIBUTE.measurement_id is an FK back to the MEASUREMENT table. So, if I have this right, you have a bidirectional PK-FK pair of relationships between MEASUREMENT and MEASUREMENT_ATTRIBUTE thru measurement_id and measurement_attribute_id. I do not see how this works or sets up the 1:M that I think you want

  • Sorry to be so dense but I do not understand who is the parent record and who are the children records in this model. Again, maybe an ERD would help me

  • What if the result is simply “Gram Positive Cocci” with value = present. Wouldn’t this be a simple MEASUREMENT and no need to evoke a MEASUREMENT_ATTRIBUTE tuple? If so, we now have heterogeneity in how micro results are represented based on the type of result.

  • You should include an example where sensitivities are reported out as numeric MICs rather than text. Pretty sure I see how you would do this (value_as_number) but best to show this.

  • Since I have already admitted I do not understand the intended hierarchical relationship, I should stop here. But, fool that I am, I’ll continue: It seems you are setting up an alternative way to set up a hierarchy (that you admit could also be leveraged to represent panels/components). This is fine with me but we need to be explicit that there are now TWO ways one can capture a hierarchy in the data tables – for measurements, use these tables and for everything else use fact_relationship. If I have this right, just want to make sure we do this with our eyes wide open. We already use specialized linked tables to capture hierarchies for concept hierarchies and I think also over in location-care sites (I haven’t been following all of the teeth mashing with location/care sites) so I think this is OK.

  • I am a shameless “leverage other folks tested work” person. Some published microbiology results models: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540481/ (Figure 1); PCORnet GPCnetwork (uses i2b2 model); https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=17&cad=rja&uact=8&ved=2ahUKEwi-wbPG6c3iAhXN854KHZZOCR04ChAWMAZ6BAgFEAI&url=https%3A%2F%2Farchive-ouverte.unige.ch%2Funige%3A23962%2FATTACHMENT01&usg=AOvVaw04T5fz1NNeSGAeFMMWix1X (just a trivial Google search). Not endorsing anything but want to make sure you’ve looked at these other models as examples

Sorry if I have completely missed how the model works.

@Christian_Reich, I have just taken a look. There in nothing in Relationship domain in SNOMED. And almost nothing other domains (if it is allowed to use other domains for this field). There is a concept 4023266 ‘Has specimen’ but it is from Observation domain and not of the desired accuracy.

**edited because copy & paste didn’t copy all from the Word Doc this originated

I have a microbiology use case, too :slight_smile: Some of this data is duplicative of more granular data contained in the same source table. However, the less granular data is useful when the data with finer granularity is NULL.

Easy ones first:

  • specimen source/specimen_type is represented as a snomed code. Examples of string terms for the code = Structure of urinary tract proper, Lower respiratory tract structure, Catheter. This will map to SPECIMEN.specimen_concept_id & assc. columns
    *origin of specimen code/body or device site is represented as a snomed code. Examples of string terms for the code = Urine specimen obtained via indwelling urinary catheter, Sputum specimen obtained by aspiration. This will map to SPECIMEN.anatomic_site_concept_id & assc. columns

Now I have multiple attributes for one *_concept_id. How do I represent it? Do I duplicate the *_concept_id to add in the attributes?

  • drug susceptibility test/antiobiotic_test is represented as a LOINC code. Example of string terms for the code = Bacteria identified in Isolate by Culture, or Bacteria identified in Unspecified specimen by Culture. This will map to MEASUREMENT. concept_id
  • type of bacteria represented as a snomed code. Examples of string terms for the code = E.Coli, shingella, no growth. This should/would “seem” to logically map to MEASUREMENT.value_as_concept_id for the above test, but the domain_id for the concept = Observation. Same issue @Dymshyts pointed out above

How do I represent the attributes of the bacteria? And are these attributes even necessary? Isn’t morphology a defining characteristic of a type of bacteria? Who’s the microbiology expert in OMOP? I want to create a 2nd field mapping for the type of bacteria (snomed code above) to the OBSERVATION.Observation_concept_id, but the following attributes won’t fit into one row of the OBSERVATION table:

  • “status” of the bacteria. The 2 values are detected & not detected.
  • morphology of the bacteria. Examples of string terms (no associated code) = branching, chains, mucoid. Is this necessary?
  • oxidase status – test to help identify bacteria. Only 2 string terms = positive or negative. Is this necessary?

Then I have the susceptibility data for the above bacteria:

  • drug susceptibility test/susceptibility represented as a LOINC code. Examples of string terms for the code = Aztreonam [Susceptibility], Ampicillin+Sulbactam [Susceptibility], Vancomycin [Susceptibility]. This will map to the MEASUREMENT.measurement_concept_id
  • RxNorm drug code- Duplicative of the above line, there are RxNorm codes for the drug being tested for bacterial susceptibility. Since the above column is populated, we don’t need to include these RxNorm codes, but it would be nice to think of a solution on how to represent the drug codes as NOT Drug Exposures, but in relation to the bacterial susceptibility when this is the only susceptibility data provided by the source
  • susceptibility data. The two values = Susceptible & resistant. This will map the MEASUREMENT.value_as_concept_id

Then there are data that I am not sure about. Are they useful? Are they duplicative of other data enumerated above? Do they even fit in the CDM without some ugly use of the Observation table?:

  • colony operator for the below fields. Values include <, >, and ~

  • colony count. This is a numeric value and is populated when the colony operator = < or >

  • colony count high and colony count low. These are a numeric values and are populated when the colony operator = ~

  • betalactamase test susceptability is represented as a LOINC code

  • betalactamase test susceptability result is represented as “positive” or “negative”

All the records will need to be linked via the Fact Relationship table using standard relationship_id concepts. However, I need to correctly field map the above before petitioning the Themis & vocabulary team to add more standard concepts :slight_smile:

I’ll take a look at @cukarthik proposal, too. But this is needed now.

Thoughts? @nzvyagina @Christian_Reich @Dymshyts


It would be very helpful if there was an additional column on your example tables with the text string for each concept_id. The background & synopsis is in ‘English’ and the table example is in ‘OMOP’.

1 Like