Mapping Microbiology Susceptibility into OMOP CDM4 Observations

(Nadya Zvyagina) #18

@Christian_Reich, I have just taken a look. There in nothing in Relationship domain in SNOMED. And almost nothing other domains (if it is allowed to use other domains for this field). There is a concept 4023266 ‘Has specimen’ but it is from Observation domain and not of the desired accuracy.

(Melanie Philofsky) #19

**edited because copy & paste didn’t copy all from the Word Doc this originated

I have a microbiology use case, too :slight_smile: Some of this data is duplicative of more granular data contained in the same source table. However, the less granular data is useful when the data with finer granularity is NULL.

Easy ones first:

  • specimen source/specimen_type is represented as a snomed code. Examples of string terms for the code = Structure of urinary tract proper, Lower respiratory tract structure, Catheter. This will map to SPECIMEN.specimen_concept_id & assc. columns
    *origin of specimen code/body or device site is represented as a snomed code. Examples of string terms for the code = Urine specimen obtained via indwelling urinary catheter, Sputum specimen obtained by aspiration. This will map to SPECIMEN.anatomic_site_concept_id & assc. columns

Now I have multiple attributes for one *_concept_id. How do I represent it? Do I duplicate the *_concept_id to add in the attributes?

  • drug susceptibility test/antiobiotic_test is represented as a LOINC code. Example of string terms for the code = Bacteria identified in Isolate by Culture, or Bacteria identified in Unspecified specimen by Culture. This will map to MEASUREMENT. concept_id
  • type of bacteria represented as a snomed code. Examples of string terms for the code = E.Coli, shingella, no growth. This should/would “seem” to logically map to MEASUREMENT.value_as_concept_id for the above test, but the domain_id for the concept = Observation. Same issue @Dymshyts pointed out above

How do I represent the attributes of the bacteria? And are these attributes even necessary? Isn’t morphology a defining characteristic of a type of bacteria? Who’s the microbiology expert in OMOP? I want to create a 2nd field mapping for the type of bacteria (snomed code above) to the OBSERVATION.Observation_concept_id, but the following attributes won’t fit into one row of the OBSERVATION table:

  • “status” of the bacteria. The 2 values are detected & not detected.
  • morphology of the bacteria. Examples of string terms (no associated code) = branching, chains, mucoid. Is this necessary?
  • oxidase status – test to help identify bacteria. Only 2 string terms = positive or negative. Is this necessary?

Then I have the susceptibility data for the above bacteria:

  • drug susceptibility test/susceptibility represented as a LOINC code. Examples of string terms for the code = Aztreonam [Susceptibility], Ampicillin+Sulbactam [Susceptibility], Vancomycin [Susceptibility]. This will map to the MEASUREMENT.measurement_concept_id
  • RxNorm drug code- Duplicative of the above line, there are RxNorm codes for the drug being tested for bacterial susceptibility. Since the above column is populated, we don’t need to include these RxNorm codes, but it would be nice to think of a solution on how to represent the drug codes as NOT Drug Exposures, but in relation to the bacterial susceptibility when this is the only susceptibility data provided by the source
  • susceptibility data. The two values = Susceptible & resistant. This will map the MEASUREMENT.value_as_concept_id

Then there are data that I am not sure about. Are they useful? Are they duplicative of other data enumerated above? Do they even fit in the CDM without some ugly use of the Observation table?:

  • colony operator for the below fields. Values include <, >, and ~

  • colony count. This is a numeric value and is populated when the colony operator = < or >

  • colony count high and colony count low. These are a numeric values and are populated when the colony operator = ~

  • betalactamase test susceptability is represented as a LOINC code

  • betalactamase test susceptability result is represented as “positive” or “negative”

All the records will need to be linked via the Fact Relationship table using standard relationship_id concepts. However, I need to correctly field map the above before petitioning the Themis & vocabulary team to add more standard concepts :slight_smile:

I’ll take a look at @cukarthik proposal, too. But this is needed now.

Thoughts? @nzvyagina @Christian_Reich @Dymshyts

(Melanie Philofsky) #20


It would be very helpful if there was an additional column on your example tables with the text string for each concept_id. The background & synopsis is in ‘English’ and the table example is in ‘OMOP’.

(Philip Zachariah) #21

In case it helps…maybe a little window into how this works in a clinical setting will help. Sorry if this is repetitive.

Morphology and other characteristics
Microbiology testing is done in stages- in the direction of increasing specificity. Historically this has been because the full identification usually takes days. So initially the lab report will say something like “gram positive cocci in clusters” or “ gram negative rod oxidase negative”. So doctors can change/adapt antibiotics if needed quickly before the final ID arrives. But this is superseded by the final ID when it arrives. So if the final ID says Staphylococcus aureus- we know the morphology was gram positive cocci in clusters or if it says E.coli we know it is oxidase negative. So in most situations, the final ID/ susceptibilities are all you need. The one place where this could be relevant is for that small group of organisms where the culture takes a long time to grow (e.g. ,months). Tuberculosis is a classic example – where you could have an initial result “AFB smear positive” and it will take weeks to months for the culture to finalize. But again with the kind of mostly long term retrospective studies that are planned this should not be that big an issue.

Most times, the micro lab will not quantify the amount of bacteria that grows. An important exception is urine cultures where depending on how the urine specimen is collected ( via catherization or asking the patient to pee in a cup)- the lab does a rough estimate of how much bacteria was there ( expressed as colony counts). There are specific thresholds (10K, 50K , 100K etc) and depending on the value you adjudicate whether this is a real infection vs just contamination of the specimen that occurred during collection. So if this is available, specially for urine cultures this is useful information to include. In Karthik’s proposal this would be a row in the measurement table tagged to the specific organism( if I understand it correctly).

Antibiotic susceptibility results outside of Sensitive/Resistant and MIC values
Antibiotic susceptibility is usually tested using methods where different concentrations of antibiotics are tested against the organism. There are pre-specified cut offs used by each lab to classify a specific bug as sensitive vs. resistant. But occasionally the lab will test for other specific resistance mechanisms to groups of antibiotics-using a gene probe, PCR or other methods. In the example above, the beta-lactamase test will identify whether the organism produces enzymes that can destroy penicillins or related drug classes. So the clinician knows that this means that a specific class of antibiotics should not be used. In real life, one can guess if some of these resistance mechanisms are present, by looking at the specific drugs being R vs S in the antibiotic susceptibility panel which arrives later. And there is a lot of diversity in whether laboratories choose to do these tests. But if included these will need to be linked to the specific organism that is identified.


(Don Torok) #22

@ nzvyagina, @ Dymshyts
This is in response to the proposed set of relationships:
1-3 - Lab test performed on specimen
2-3 - Lab test performed on specimen
1-2 - Related lab test
2-1 - Related lab test
3-1 - Specimen used for lab test
3-2 - Specimen used for lab test

Looks like the number of relationships will grow exponentially if you were to add another event (Observation or Measurement) to the group. That is a problem. Also, there has been a convention that relationships are directional and have an inverse. What would be the inverse ‘Related lab test’? There is probably an ordering or dependency to the events. Urine > E coli > Susceptibility make sense, but Urine > Susceptibility or the inverse Susceptibility > Urine without the intervening E coli does not make sense.

(Seng Chan You) #23

Can we leverage episode extension proposed by oncology WG for this purpose? How do you think? @rimma @mgurley

(Nadya Zvyagina) #24

@MPhilofsky, I would propose the following:
*status - I would not create a record in case ‘Not detected’ because observation_concept_id (E.Coli) may mislead someone if they do not look at the result. So since there are only 2 values Detected and Not detected, there is no necessity of creating a separate record for status.
*morphology - observation_concept_id (E.coli) + value_as_concept_id (some morphology value)
*oxidase status - to tell the truth I do not know what it is. But taking into account what @philipzach wrote, looks like it is not that important since you already know the final ID (E.coli, staphylococcus aureus etc). May be I am wrong.

(Melanie Philofsky) #25

Thank you, @philipzach, for the detailed explanation. It is helpful to know the clinical perspective.

Very good point! Without a specific use case, it isn’t necessary or useful.

(Christian Reich) #26

@cukarthik et al.

Hm. This looks like a gigantic solution for a rather small problem. I think. Correct me if I am wrong.

Rather than the pair “measurement - result” you are identifying a triplet situation here: “organism - antibiotic testing - result”. And in your proposal the organism would reside in MEASUREMENT, the testing and result would be MEASUREMENT_ATTRIBUTE. And then you link the two. Correct?

Why not do the following: Have two records in MEASUREMENT.

  • Blood culture - staphylococcus areus
  • Antibiotic testing - resistant

I also understand you don’t like the FACT_RELATIONSHIP table as a mechanism to connect the two. Neither do I. Instead, we could add a field “reference_measurement_id” to the table, which links the latter to the former. This would also solve @MPhilofsky’s problem that there might be more than one subsequent test on the staph colonies. Each of them would refer to the same original blood culture record.

Seems to work to me, and a lot less change.


(Nadya Zvyagina) #27

Hi @Christian_Reich, reference_measurement_id will work in case of only one related measurement. But adding this field should entail a rule that only one reference measurement can exist, otherwise we will get two (or more) identical records differing by reference_measurement_id only.

(Michael Gurley) #28

All EAV tables (MEASUREMENT and OBSERVATION) should uniformly be able to polymorphically reference an Entity that the Attribute (measurement_concept_id /observation_concept_id) and Value (value_as_number, value_as_string,value_as_concept_id) are about. I am assuming that Person is not a fine-grained enough Entity for many use cases. Using a polymorphic pairing would remove only being able to reference a specimen_id.

OBSERVATION already has a polymorphic column pairing giving it the ability to reference an Entity: observation_event_id and obs_event_field_concept_id. The Oncology CDM extension is proposing to add a polymorphic column pairing to MEASUREMENT: modifier_of_event_id and modifier_of_field_concept_id. I think it would be better to make these pairings adhere to a consistent naming convention.

The other item that this proposal raises is the need to group EAV rows into ‘panels’, ‘collection’ or ‘rows’ . This is a common requirement that all EAV systems need to eventually embrace or reject. Often some kind of grouper column is added to the EAV table that pulls all the desired EAV entries into a collection. Something like ‘measurement_grouper’ or ‘observation_grouper’. You could put a GUID in the rows you want to be able to make into a ‘panel’, ‘collection’ or ‘row’.

If you are a relational purist this would all be done in a two separate tables OBSERVATION_GROUP and OBSERVATION_GROUP_OBSERVATION and MEASUREMENT_GROUP and MEASUREMENT_GROUP_MEASUREMENT. This would be more in line with the proposal’s MEASUREMENT_ATTRIBUTE table (which I think is a confusing name since it is grouping attributes together).

(Karthik) #29

I’m still trying to understand polymorphically pairing :slight_smile:, but I think I get it. I do agree we need to keep a standard naming convention for these references. I’m not opposed to an additional column in the measurement table that’s either a GUID or a reference measurement id. From a query point of view, I suppose it would be a self-join on the measurement table, which somewhat similar to joining to a new measurement_attribute or measurement group table. If you go the route as mention by @mgurley and @Christian_Reich, we would also need to introduce the specimen_id field into the measurement table, which is a needed in my opinion.

(Tatiana) #30

Hi all!

I would like to share the solution which we implemented in our project. It is very similar to the one posted by
@nzvyagina with the only difference that we use modifier_of_event_id and modifier_of_field_concept_id fields proposed by oncology WG https://github.com/OHDSI/OncologyWG/wiki/MEASUREMENT to link specimen and the associated measurement records.

Specimen: Bld CVC
Culture: Bacterial blood culture
Organism identified in culture: Coagulase negative staphylococcus
Drug susceptibility test: 28-1 (‘Ampicillin [Susceptibility] by Minimum inhibitory concentration (MIC)’)
Drug result: >8
Susceptibility: Resistant
Serological test: 20966-8 (‘Staphylococcus sp identified in Unspecified specimen by Organism specific culture’)
Serological test result: Detected

1st record: Specimen table.
Specimen_concept_id = 4045667 (Venous blood specimen)
2nd record: Measurement table. Store bacterial culture and organisms identified in specimen.
Measurement_concept_id = 3023368 (‘Bacteria identified in Blood by Culture’)
Value_as_concept_id: 36309331 (‘Coagulase-negative staphylococci’)
Modifier_of_event_id = specimen_id from the 1st record
Modifier_of_field_concept_id = concept_id with the name ‘specimen.specimen_id’
3rd record: Measurement table. Store drug susceptibility test.
Measurement_concept_id = 28-1 (‘Ampicillin [Susceptibility] by Minimum inhibitory concentration (MIC)’)
Value_as_number = 8
Operator_concept_id = 4172704 (‘>’)
Value_as_concept_id = 45878594 (‘Resistant’)
Modifier_of_event_id = measurement_id from the 2nd record to link drug susceptibility test with the associated bacteria
Modifier_of_field_concept_id = concept_id with the name ‘measurement.measurement_id’
4th record: Measurement table. Store serological test.
Measurement_concept_id = 20966-8 (‘Staphylococcus sp identified in Unspecified specimen by Organism specific culture’)
Value_as_concept_id = 45877985 (‘Detected’)
Modifier_of_event_id = specimen_id from the 1st record
Modifier_of_field_concept_id = concept_id with the name ‘specimen.specimen_id

To summarize, we create relations:

  • Bacterial culture to Specimen
  • Drug susceptibility test to Bacterial Culture
  • Serologial test to Specimen

The problems which we encountered during the implementation and which could be discussed within community:

  • Measurement.value_as_concept_id should belong to domain_id = ‘Meas Value’. Unfortunately, not all culture results belong to this domain. In priority, we mapped to ‘Meas Value’ and when it was not possible - then we mapped to ‘Observation’ and stored it to measurement.value_as_concept_id.
  • Would it be convenient for everyone to use modifier_of_event_id instead of Fact_relationship table?

As this topic was discussed many times with different proposals, I believe we have enough examples and use cases to come up with the common solution which can be published in CDM conventions. @Christian_Reich, @SCYou, @Dymshyts, @rimma, @DTorok, @cukarthik, @mgkahn, @MPhilofsky, @philipzach, @mgurley, what do you think guys?

Measurement Table - are value_as_concept_id and value_as_number mutually exclusive?
(Karthik) #31

This makes sense to me and is a better option than the fact relationship table :slight_smile: For me, the problem I have is that we are not on version 6.0 to leverage these fields. A practical implementation comment I have is that a self-join on the measurement able is required for to link bacteria and drug susceptibility. Most queries agains the measurement table are slow for us and to join it to itself, I’m worried would be very slow. I know this isn’t a modeling issue but it is a usability issue. I wonder if others have the same issue w/ the measurement table. Otherwise, I think this proposal seems good. Maybe we allow ‘Meas Value’ and ‘Observation’ be allowable in the value_as_concept_id.

(Dmytry Dymshyts) #32

And we should be on version 6.1 already to be able to use @TBanokina’s approach.

Oncology extension will be added to CDM v6.1, which will be officially released in the end of the current year.

(Christian Reich) #33

Sounds like we should discuss all the options and put this thing to bed. Except the CDM meetings are now all usurped by @clairblacketer to revise the CDM. Which we certainly need. Let me figure it out and propose something.

(Alexander Davydov) #34

You haven’t tagged me but let me comment on this.

Doesn’t make any sense. The mentioned LOINC is not a serological test. It’s the same test that you have in the 2nd record. Can you add more examples, please? The point is to link all the tests performed on the specimen with the specimen, right?

Right, but this is a Vocabulary issue. Domains are not clean. As long as that’s true, people here and there ignore this convention.

FACT_RELATIONSHIP is a kind of retired, isn’t it? Modifier_of_event_id / observation_event_id are much more progressive, but if it comes to the 6.1 I’d recommend the following:

  • introduce specimen_id field to the Measurement table (and possibly, Observation table - unless Domains are clean) as a foreign key. The same as we do for provider, visit, visit_detail, care_site, etc. This solution would be much more natural. The tools would be upgraded by the introduction of the same scenarios that are already implemented.
  • The only thing you’d need to link with your approach is a Drug susceptibility test to Bacterial Culture. But there were discussions around the creation of precoordinated: microorganisms x antimicrobials x susceptibility testing methods. @Christian_Reich is it still a plan?

And a separate question is how methods are being prepared to support Modifier_of_event_id / observation_event_id linking. Can somebody please update?

(Melanie Philofsky) #35

I don’t think you should store record #1 in the Specimen table

because the data is inherent in

RE: Measurement.value_as_concept_id
I would like to petition the CDM working group to include more domain_ids allowed in Measurement.value_as_concept_id because the current convention,

Is too restrictive.

If allowed, the staphylococcus identified in the 4th record would live in the Measurement.value_as_concept_id field of record #2. Eliminating the need for a separate 4th record. AND, IMHO, this is the most logical representation of the data. The Measurement record is a blood culture test (measurement_concept_id) and the result of the test is it grew Staph (value_as_concept_id).

The current structure of the CDM would require the susceptibility record to be a separate Measurement record as you described:

This topic has been around over 5 years. Clear conventions would be helpful to make the data available for a network query.

(Dmytry Dymshyts) #36

Normally, Meas Value is the result of some measurement by design (Qualifier Value of SNOMED or Answers of LOINC), but here’s the class of Organisms.
It’s hard to imagine the cases when organisms could be used in OMOP CDM except of results of bacterial culture, but still domain change should be thought through.

If we want to keep this Meas Value constaint, @Alexdavv please explain what is the usefulness of this restriction by Meas Value domain.
Shouldn’t it be said that it is recommended to use Meas Value, but other domains are allowed if needed?

What if we introduce specimen_concept_id instead?

Here’s another related topic

(Alexander Davydov) #37

Agree. A simple rule might be applied: everything that doesn’t make sense to be stored in the event_as_concept_id field, might go to the Meas Value Domain. This is a good constraint that prevents the creation of such events, actually. There are probably a few exceptions. I think organisms are not among them.

Not very useful. But once Domains are adjusted as described above, you’ll not need to violate this restriction so often. All the cases when we put Procedures/Conditions/Devices in the value_as_concept_id field go to the Observation table, agree? The only exception that comes to my mind is the Drugs recorded as values with some Measurements having a nominal scale, e.g. Drugs identified in Gastric fluid by Screen method.

Currently, we store the specimen data in the specimen table as well as it’s a piece of the Measurement concepts semantics. If we add specimen_concept_id, the specimen table together with all the included data should be retired then?