OHDSI Home | Forums | Wiki | Github

Mapping Microbiology Susceptibility into OMOP CDM4 Observations


Yes, this is the reason we created the fact_relationship table. Of course, that only exists in V5. So, upgrade!!!


I would go with a modification of Mark’s approach: 2 observations at the same day:

  1. Antibiogram in the isolate mentioned
  2. Antibiotic resistance yes/no

Let me know if you need help with finding hte concepts.


1 Like

Here is how I am going to handle mapping microbiology susceptibility in CDMv4.

As discussed there is the isolate (tissue sample) and one or more antibiotics with a result for each antibiotic. For every set of tests, I am creating a single procedure record. The procedure type will be ‘Culture Sensitivity’ concept id 4170475. The procedure concept id will refer to the sample being tested, for example ‘Staphylococcus aureus’ (4149419). Then in the observation table create a record for each antibiotic where the Observation Type is the sample being tested, the Observation is the antibiotic and the result is either Resistant or Sensitive.

Then to find the observations for sensitivity of any isolate to antibiotics for the patient in visit 123
FROM procedure_occurrence proc
JOIN observation obs
ON obs.observation_type_concept_id = proc.procedure_concept_id
AND obs.visit_occurrence_id = proc.visit_occurrence_id
WHERE procedure_type_concept_id = 4170475 – Culture Sensitivity’
AND procedure.visit_occurrence_id = 123

1 Like

Someone asked me how to store microbiology susceptibility in Korean for OMOP CDM 5 or CDM 6 (https://github.com/ohdsi-korea/ThemisKorea/issues/1)
Do we have any convention for this?

I’m not sure about the convention, but here’s how we did it in some previous projects:

Let’s take an example:
Source: stool specimen, Rahnella aquatilis, grow = “+++”, Gentamicin Susceptible.
1st row: MEASUREMENT_concept_id = 3025941 (Bacteria identified in Stool by Culture)
2nd row: MEASUREMENT_concept_id = 3035007 (Gentamicin [Susceptibility]), value_as_concept_id = 4038110 Susceptible
Observation_concept_id = 4001253 (Rahnella aquatilis), value_as_concept_id = 4125547 (+++)
SPECIMEN_concept_id = 4002879 Stool specimen

These are connected via FACT_Relationship: OBSERVATION_id is connected to other entries mentioned.

Right, we link these rows in fact_relartionship table. But I am not sure what record we chose as the main one. It was either OBSERVATION_ID as you mentioned or MEASUREMENT_ID from the 1st row.

By the way, now I have another example of Microbiology data mapping from a source of bit different structure.
specimen_type (Urine Specimen),
body_site (Urinary system structure),
lab_procedure (Urine culture),
organism (Escherichia coli),
antibiotic_test (Cefazolin [Susceptibility]),
sensitivity (4 ug/mL)
susceptibility (Susceptible)
1st row: MEASUREMENT measurement_concept_id (lab_procedure - Urine culture)
value_as_concept_id (Organism - Escherichia coli )
2nd row: MEASUREMENT measurement_concept_id (antibiotic_test - Cefazolin [Susceptibility] )
value_as_number (sensitivity - 4)
unit_concept_id (ug/mL)
value_as_concept_id (susceptibility - Susceptible)
3rd row: SPECIMEN specimen_concept_id (specimen_type - Urine Specimen)
anatomic_site_concept_id (body_site - Urinary system structure)

All these rows are linked in fact_relationship table. But here we realized that there is no suitable concepts for this.
1-3 - 4023266 (Has Specimen) - can probably be used
2-3 - 4023266 (Has Specimen) - can probably be used
1-2 - ???
2-1 - ???
3-1 - ???
3-2 - ???

1 Like

@nzvyagina I really like your approach.

brakes the rule that MEASUREMENT.value_as_concept_id belongs to ‘Meas Value’ domain.
Personally, I don’t like this rule. @Christian_Reich, why do we even have this restriction?

You’re talking about FACT_RELATIONSHIP.relationship_concept_id, right?
In this case we can use only Standard concepts with vocabulary_id = ‘Relationship’.

We can’t just pick the standard concept with corresponding meaning (like 4023266 (Has Specimen) which is a SNOMED Observation).
@nzvyagina, please make the list of concept names for new relationships. So, we can discuss and add them to the vocabulary then.

Thank you for your reply and correction. Then it would be great to have the following concepts for relationship_concept_id:
1-3 - Lab test performed on specimen
2-3 - Lab test performed on specimen
1-2 - Related lab test
2-1 - Related lab test
3-1 - Specimen used for lab test
3-2 - Specimen used for lab test

The names can be rephrased if needed.

Let’s hear from the other people before adding these concepts.

Because we want to make it clean what values can be in what domain.

Don’t think so. Where are the parent of concepts?

Yes, please. And then give it to THEMIS!

Do we have them in SNOMED? Can you look?

A colleague and I just created a proposal to handle microbiology results. We are hoping this proposal will also handle other types of clinical results that are not fully represented in the measurement table without the dreaded fact-relationship table :slight_smile:

The proposal links out to a google document to get the communities input before the CDM working group call next week.

Would love to get input from @bailey @mgkahn @Andrew @Christian_Reich


Do you mean parent of 4023266 (Has Specimen)?
it’s a “Linkage concept”, I suppose it means the linkage between SNOMED concepts.
So, if you say so, we can reuse “Linkage concepts” from SNOMED, right?

or don’t use fact_relationship at all if we agree with

@cukarthik I like your proposal! :slight_smile:

1 Like

I have to give credit to @philipzach who helped and really pushed to get this proposal out to the community. Thanks @philipzach!

Sorry but I am not tracking the proposal. Maybe a graphical ERD would help me figure out how the links work:

  • MEASUREMENT_ATTRIBUTE.measurement_id: "A foreign key that refers to a survey concept identifier in the standardized vocabularies. Three issues: (1) If it really points to a concept, it needs to be have “concept_id” somewhere in the field name and (2) why limit to a survey concept identifier, (3) based on your example, it is a foreign key to MEASUREMENT.measurement_id, not to a concept_id

  • MEASUREMENT.measurement_attribute_id points to some record in the new MEASUREMENT_ATTRIBUTE table. Your example isn’t a FK to your example row in MEASUREMENT_ATTRIBUTE – the ID’s do not line up. Should they for your example?

  • MEASUREMENT_ATTRIBUTE.measurement_id is an FK back to the MEASUREMENT table. So, if I have this right, you have a bidirectional PK-FK pair of relationships between MEASUREMENT and MEASUREMENT_ATTRIBUTE thru measurement_id and measurement_attribute_id. I do not see how this works or sets up the 1:M that I think you want

  • Sorry to be so dense but I do not understand who is the parent record and who are the children records in this model. Again, maybe an ERD would help me

  • What if the result is simply “Gram Positive Cocci” with value = present. Wouldn’t this be a simple MEASUREMENT and no need to evoke a MEASUREMENT_ATTRIBUTE tuple? If so, we now have heterogeneity in how micro results are represented based on the type of result.

  • You should include an example where sensitivities are reported out as numeric MICs rather than text. Pretty sure I see how you would do this (value_as_number) but best to show this.

  • Since I have already admitted I do not understand the intended hierarchical relationship, I should stop here. But, fool that I am, I’ll continue: It seems you are setting up an alternative way to set up a hierarchy (that you admit could also be leveraged to represent panels/components). This is fine with me but we need to be explicit that there are now TWO ways one can capture a hierarchy in the data tables – for measurements, use these tables and for everything else use fact_relationship. If I have this right, just want to make sure we do this with our eyes wide open. We already use specialized linked tables to capture hierarchies for concept hierarchies and I think also over in location-care sites (I haven’t been following all of the teeth mashing with location/care sites) so I think this is OK.

  • I am a shameless “leverage other folks tested work” person. Some published microbiology results models: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540481/ (Figure 1); PCORnet GPCnetwork (uses i2b2 model); https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=17&cad=rja&uact=8&ved=2ahUKEwi-wbPG6c3iAhXN854KHZZOCR04ChAWMAZ6BAgFEAI&url=https%3A%2F%2Farchive-ouverte.unige.ch%2Funige%3A23962%2FATTACHMENT01&usg=AOvVaw04T5fz1NNeSGAeFMMWix1X (just a trivial Google search). Not endorsing anything but want to make sure you’ve looked at these other models as examples

Sorry if I have completely missed how the model works.

@Christian_Reich, I have just taken a look. There in nothing in Relationship domain in SNOMED. And almost nothing other domains (if it is allowed to use other domains for this field). There is a concept 4023266 ‘Has specimen’ but it is from Observation domain and not of the desired accuracy.

**edited because copy & paste didn’t copy all from the Word Doc this originated

I have a microbiology use case, too :slight_smile: Some of this data is duplicative of more granular data contained in the same source table. However, the less granular data is useful when the data with finer granularity is NULL.

Easy ones first:

  • specimen source/specimen_type is represented as a snomed code. Examples of string terms for the code = Structure of urinary tract proper, Lower respiratory tract structure, Catheter. This will map to SPECIMEN.specimen_concept_id & assc. columns
    *origin of specimen code/body or device site is represented as a snomed code. Examples of string terms for the code = Urine specimen obtained via indwelling urinary catheter, Sputum specimen obtained by aspiration. This will map to SPECIMEN.anatomic_site_concept_id & assc. columns

Now I have multiple attributes for one *_concept_id. How do I represent it? Do I duplicate the *_concept_id to add in the attributes?

  • drug susceptibility test/antiobiotic_test is represented as a LOINC code. Example of string terms for the code = Bacteria identified in Isolate by Culture, or Bacteria identified in Unspecified specimen by Culture. This will map to MEASUREMENT. concept_id
  • type of bacteria represented as a snomed code. Examples of string terms for the code = E.Coli, shingella, no growth. This should/would “seem” to logically map to MEASUREMENT.value_as_concept_id for the above test, but the domain_id for the concept = Observation. Same issue @Dymshyts pointed out above

How do I represent the attributes of the bacteria? And are these attributes even necessary? Isn’t morphology a defining characteristic of a type of bacteria? Who’s the microbiology expert in OMOP? I want to create a 2nd field mapping for the type of bacteria (snomed code above) to the OBSERVATION.Observation_concept_id, but the following attributes won’t fit into one row of the OBSERVATION table:

  • “status” of the bacteria. The 2 values are detected & not detected.
  • morphology of the bacteria. Examples of string terms (no associated code) = branching, chains, mucoid. Is this necessary?
  • oxidase status – test to help identify bacteria. Only 2 string terms = positive or negative. Is this necessary?

Then I have the susceptibility data for the above bacteria:

  • drug susceptibility test/susceptibility represented as a LOINC code. Examples of string terms for the code = Aztreonam [Susceptibility], Ampicillin+Sulbactam [Susceptibility], Vancomycin [Susceptibility]. This will map to the MEASUREMENT.measurement_concept_id
  • RxNorm drug code- Duplicative of the above line, there are RxNorm codes for the drug being tested for bacterial susceptibility. Since the above column is populated, we don’t need to include these RxNorm codes, but it would be nice to think of a solution on how to represent the drug codes as NOT Drug Exposures, but in relation to the bacterial susceptibility when this is the only susceptibility data provided by the source
  • susceptibility data. The two values = Susceptible & resistant. This will map the MEASUREMENT.value_as_concept_id

Then there are data that I am not sure about. Are they useful? Are they duplicative of other data enumerated above? Do they even fit in the CDM without some ugly use of the Observation table?:

  • colony operator for the below fields. Values include <, >, and ~

  • colony count. This is a numeric value and is populated when the colony operator = < or >

  • colony count high and colony count low. These are a numeric values and are populated when the colony operator = ~

  • betalactamase test susceptability is represented as a LOINC code

  • betalactamase test susceptability result is represented as “positive” or “negative”

All the records will need to be linked via the Fact Relationship table using standard relationship_id concepts. However, I need to correctly field map the above before petitioning the Themis & vocabulary team to add more standard concepts :slight_smile:

I’ll take a look at @cukarthik proposal, too. But this is needed now.

Thoughts? @nzvyagina @Christian_Reich @Dymshyts


It would be very helpful if there was an additional column on your example tables with the text string for each concept_id. The background & synopsis is in ‘English’ and the table example is in ‘OMOP’.

1 Like

In case it helps…maybe a little window into how this works in a clinical setting will help. Sorry if this is repetitive.

Morphology and other characteristics
Microbiology testing is done in stages- in the direction of increasing specificity. Historically this has been because the full identification usually takes days. So initially the lab report will say something like “gram positive cocci in clusters” or “ gram negative rod oxidase negative”. So doctors can change/adapt antibiotics if needed quickly before the final ID arrives. But this is superseded by the final ID when it arrives. So if the final ID says Staphylococcus aureus- we know the morphology was gram positive cocci in clusters or if it says E.coli we know it is oxidase negative. So in most situations, the final ID/ susceptibilities are all you need. The one place where this could be relevant is for that small group of organisms where the culture takes a long time to grow (e.g. ,months). Tuberculosis is a classic example – where you could have an initial result “AFB smear positive” and it will take weeks to months for the culture to finalize. But again with the kind of mostly long term retrospective studies that are planned this should not be that big an issue.

Most times, the micro lab will not quantify the amount of bacteria that grows. An important exception is urine cultures where depending on how the urine specimen is collected ( via catherization or asking the patient to pee in a cup)- the lab does a rough estimate of how much bacteria was there ( expressed as colony counts). There are specific thresholds (10K, 50K , 100K etc) and depending on the value you adjudicate whether this is a real infection vs just contamination of the specimen that occurred during collection. So if this is available, specially for urine cultures this is useful information to include. In Karthik’s proposal this would be a row in the measurement table tagged to the specific organism( if I understand it correctly).

Antibiotic susceptibility results outside of Sensitive/Resistant and MIC values
Antibiotic susceptibility is usually tested using methods where different concentrations of antibiotics are tested against the organism. There are pre-specified cut offs used by each lab to classify a specific bug as sensitive vs. resistant. But occasionally the lab will test for other specific resistance mechanisms to groups of antibiotics-using a gene probe, PCR or other methods. In the example above, the beta-lactamase test will identify whether the organism produces enzymes that can destroy penicillins or related drug classes. So the clinician knows that this means that a specific class of antibiotics should not be used. In real life, one can guess if some of these resistance mechanisms are present, by looking at the specific drugs being R vs S in the antibiotic susceptibility panel which arrives later. And there is a lot of diversity in whether laboratories choose to do these tests. But if included these will need to be linked to the specific organism that is identified.


1 Like

@ nzvyagina, @ Dymshyts
This is in response to the proposed set of relationships:
1-3 - Lab test performed on specimen
2-3 - Lab test performed on specimen
1-2 - Related lab test
2-1 - Related lab test
3-1 - Specimen used for lab test
3-2 - Specimen used for lab test

Looks like the number of relationships will grow exponentially if you were to add another event (Observation or Measurement) to the group. That is a problem. Also, there has been a convention that relationships are directional and have an inverse. What would be the inverse ‘Related lab test’? There is probably an ordering or dependency to the events. Urine > E coli > Susceptibility make sense, but Urine > Susceptibility or the inverse Susceptibility > Urine without the intervening E coli does not make sense.

Can we leverage episode extension proposed by oncology WG for this purpose? How do you think? @rimma @mgurley