Mapping Microbiology Susceptibility into OMOP CDM4 Observations

philipzach · June 5, 2019, 6:18pm

In case it helps…maybe a little window into how this works in a clinical setting will help. Sorry if this is repetitive.

Morphology and other characteristics
Microbiology testing is done in stages- in the direction of increasing specificity. Historically this has been because the full identification usually takes days. So initially the lab report will say something like “gram positive cocci in clusters” or “ gram negative rod oxidase negative”. So doctors can change/adapt antibiotics if needed quickly before the final ID arrives. But this is superseded by the final ID when it arrives. So if the final ID says Staphylococcus aureus- we know the morphology was gram positive cocci in clusters or if it says E.coli we know it is oxidase negative. So in most situations, the final ID/ susceptibilities are all you need. The one place where this could be relevant is for that small group of organisms where the culture takes a long time to grow (e.g. ,months). Tuberculosis is a classic example – where you could have an initial result “AFB smear positive” and it will take weeks to months for the culture to finalize. But again with the kind of mostly long term retrospective studies that are planned this should not be that big an issue.

Abundance
Most times, the micro lab will not quantify the amount of bacteria that grows. An important exception is urine cultures where depending on how the urine specimen is collected ( via catherization or asking the patient to pee in a cup)- the lab does a rough estimate of how much bacteria was there ( expressed as colony counts). There are specific thresholds (10K, 50K , 100K etc) and depending on the value you adjudicate whether this is a real infection vs just contamination of the specimen that occurred during collection. So if this is available, specially for urine cultures this is useful information to include. In Karthik’s proposal this would be a row in the measurement table tagged to the specific organism( if I understand it correctly).

Antibiotic susceptibility results outside of Sensitive/Resistant and MIC values
Antibiotic susceptibility is usually tested using methods where different concentrations of antibiotics are tested against the organism. There are pre-specified cut offs used by each lab to classify a specific bug as sensitive vs. resistant. But occasionally the lab will test for other specific resistance mechanisms to groups of antibiotics-using a gene probe, PCR or other methods. In the example above, the beta-lactamase test will identify whether the organism produces enzymes that can destroy penicillins or related drug classes. So the clinician knows that this means that a specific class of antibiotics should not be used. In real life, one can guess if some of these resistance mechanisms are present, by looking at the specific drugs being R vs S in the antibiotic susceptibility panel which arrives later. And there is a lot of diversity in whether laboratories choose to do these tests. But if included these will need to be linked to the specific organism that is identified.

.

DTorok · June 5, 2019, 8:41pm

@ nzvyagina, @ Dymshyts
This is in response to the proposed set of relationships:
1-3 - Lab test performed on specimen
2-3 - Lab test performed on specimen
1-2 - Related lab test
2-1 - Related lab test
3-1 - Specimen used for lab test
3-2 - Specimen used for lab test

Looks like the number of relationships will grow exponentially if you were to add another event (Observation or Measurement) to the group. That is a problem. Also, there has been a convention that relationships are directional and have an inverse. What would be the inverse ‘Related lab test’? There is probably an ordering or dependency to the events. Urine > E coli > Susceptibility make sense, but Urine > Susceptibility or the inverse Susceptibility > Urine without the intervening E coli does not make sense.

SCYou · June 5, 2019, 11:09pm

Can we leverage episode extension proposed by oncology WG for this purpose? How do you think? @rimma @mgurley

nzvyagina · June 7, 2019, 6:40am

@MPhilofsky, I would propose the following:
*status - I would not create a record in case ‘Not detected’ because observation_concept_id (E.Coli) may mislead someone if they do not look at the result. So since there are only 2 values Detected and Not detected, there is no necessity of creating a separate record for status.
*morphology - observation_concept_id (E.coli) + value_as_concept_id (some morphology value)
*oxidase status - to tell the truth I do not know what it is. But taking into account what @philipzach wrote, looks like it is not that important since you already know the final ID (E.coli, staphylococcus aureus etc). May be I am wrong.

MPhilofsky · June 7, 2019, 12:40pm

Thank you, @philipzach, for the detailed explanation. It is helpful to know the clinical perspective.

Very good point! Without a specific use case, it isn’t necessary or useful.

Christian_Reich · June 15, 2019, 12:37am

@cukarthik et al.

Hm. This looks like a gigantic solution for a rather small problem. I think. Correct me if I am wrong.

Rather than the pair “measurement - result” you are identifying a triplet situation here: “organism - antibiotic testing - result”. And in your proposal the organism would reside in MEASUREMENT, the testing and result would be MEASUREMENT_ATTRIBUTE. And then you link the two. Correct?

Why not do the following: Have two records in MEASUREMENT.

Blood culture - staphylococcus areus
Antibiotic testing - resistant

I also understand you don’t like the FACT_RELATIONSHIP table as a mechanism to connect the two. Neither do I. Instead, we could add a field “reference_measurement_id” to the table, which links the latter to the former. This would also solve @MPhilofsky’s problem that there might be more than one subsequent test on the staph colonies. Each of them would refer to the same original blood culture record.

Seems to work to me, and a lot less change.

Thoughts?

nzvyagina · June 19, 2019, 9:32am

Hi @Christian_Reich, reference_measurement_id will work in case of only one related measurement. But adding this field should entail a rule that only one reference measurement can exist, otherwise we will get two (or more) identical records differing by reference_measurement_id only.

mgurley · June 19, 2019, 12:26pm

All EAV tables (MEASUREMENT and OBSERVATION) should uniformly be able to polymorphically reference an Entity that the Attribute (measurement_concept_id /observation_concept_id) and Value (value_as_number, value_as_string,value_as_concept_id) are about. I am assuming that Person is not a fine-grained enough Entity for many use cases. Using a polymorphic pairing would remove only being able to reference a specimen_id.

OBSERVATION already has a polymorphic column pairing giving it the ability to reference an Entity: observation_event_id and obs_event_field_concept_id. The Oncology CDM extension is proposing to add a polymorphic column pairing to MEASUREMENT: modifier_of_event_id and modifier_of_field_concept_id. I think it would be better to make these pairings adhere to a consistent naming convention.

The other item that this proposal raises is the need to group EAV rows into ‘panels’, ‘collection’ or ‘rows’ . This is a common requirement that all EAV systems need to eventually embrace or reject. Often some kind of grouper column is added to the EAV table that pulls all the desired EAV entries into a collection. Something like ‘measurement_grouper’ or ‘observation_grouper’. You could put a GUID in the rows you want to be able to make into a ‘panel’, ‘collection’ or ‘row’.

If you are a relational purist this would all be done in a two separate tables OBSERVATION_GROUP and OBSERVATION_GROUP_OBSERVATION and MEASUREMENT_GROUP and MEASUREMENT_GROUP_MEASUREMENT. This would be more in line with the proposal’s MEASUREMENT_ATTRIBUTE table (which I think is a confusing name since it is grouping attributes together).

cukarthik · August 6, 2019, 1:24am

I’m still trying to understand polymorphically pairing , but I think I get it. I do agree we need to keep a standard naming convention for these references. I’m not opposed to an additional column in the measurement table that’s either a GUID or a reference measurement id. From a query point of view, I suppose it would be a self-join on the measurement table, which somewhat similar to joining to a new measurement_attribute or measurement group table. If you go the route as mention by @mgurley and @Christian_Reich, we would also need to introduce the specimen_id field into the measurement table, which is a needed in my opinion.

TBanokina · June 10, 2020, 1:52pm

Hi all!

I would like to share the solution which we implemented in our project. It is very similar to the one posted by
@nzvyagina with the only difference that we use modifier_of_event_id and modifier_of_field_concept_id fields proposed by oncology WG https://github.com/OHDSI/OncologyWG/wiki/MEASUREMENT to link specimen and the associated measurement records.

Source:
Specimen: Bld CVC
Culture: Bacterial blood culture
Organism identified in culture: Coagulase negative staphylococcus
Drug susceptibility test: 28-1 (‘Ampicillin [Susceptibility] by Minimum inhibitory concentration (MIC)’)
Drug result: >8
Susceptibility: Resistant
Serological test: 20966-8 (‘Staphylococcus sp identified in Unspecified specimen by Organism specific culture’)
Serological test result: Detected

CDM:
1st record: Specimen table.
Specimen_concept_id = 4045667 (Venous blood specimen)
2nd record: Measurement table. Store bacterial culture and organisms identified in specimen.
Measurement_concept_id = 3023368 (‘Bacteria identified in Blood by Culture’)
Value_as_concept_id: 36309331 (‘Coagulase-negative staphylococci’)
Modifier_of_event_id = specimen_id from the 1st record
Modifier_of_field_concept_id = concept_id with the name ‘specimen.specimen_id’
3rd record: Measurement table. Store drug susceptibility test.
Measurement_concept_id = 28-1 (‘Ampicillin [Susceptibility] by Minimum inhibitory concentration (MIC)’)
Value_as_number = 8
Operator_concept_id = 4172704 (‘>’)
Value_as_concept_id = 45878594 (‘Resistant’)
Modifier_of_event_id = measurement_id from the 2nd record to link drug susceptibility test with the associated bacteria
Modifier_of_field_concept_id = concept_id with the name ‘measurement.measurement_id’
4th record: Measurement table. Store serological test.
Measurement_concept_id = 20966-8 (‘Staphylococcus sp identified in Unspecified specimen by Organism specific culture’)
Value_as_concept_id = 45877985 (‘Detected’)
Modifier_of_event_id = specimen_id from the 1st record
Modifier_of_field_concept_id = concept_id with the name ‘specimen.specimen_id

To summarize, we create relations:

Bacterial culture to Specimen
Drug susceptibility test to Bacterial Culture
Serologial test to Specimen

The problems which we encountered during the implementation and which could be discussed within community:

Measurement.value_as_concept_id should belong to domain_id = ‘Meas Value’. Unfortunately, not all culture results belong to this domain. In priority, we mapped to ‘Meas Value’ and when it was not possible - then we mapped to ‘Observation’ and stored it to measurement.value_as_concept_id.
Would it be convenient for everyone to use modifier_of_event_id instead of Fact_relationship table?

As this topic was discussed many times with different proposals, I believe we have enough examples and use cases to come up with the common solution which can be published in CDM conventions. @Christian_Reich, @SCYou, @Dymshyts, @rimma, @DTorok, @cukarthik, @mgkahn, @MPhilofsky, @philipzach, @mgurley, what do you think guys?

cukarthik · June 10, 2020, 6:02pm

This makes sense to me and is a better option than the fact relationship table For me, the problem I have is that we are not on version 6.0 to leverage these fields. A practical implementation comment I have is that a self-join on the measurement able is required for to link bacteria and drug susceptibility. Most queries agains the measurement table are slow for us and to join it to itself, I’m worried would be very slow. I know this isn’t a modeling issue but it is a usability issue. I wonder if others have the same issue w/ the measurement table. Otherwise, I think this proposal seems good. Maybe we allow ‘Meas Value’ and ‘Observation’ be allowable in the value_as_concept_id.

Dymshyts · June 12, 2020, 2:33pm

And we should be on version 6.1 already to be able to use @TBanokina’s approach.

Oncology extension will be added to CDM v6.1, which will be officially released in the end of the current year.

Christian_Reich · June 15, 2020, 2:12am

Sounds like we should discuss all the options and put this thing to bed. Except the CDM meetings are now all usurped by @clairblacketer to revise the CDM. Which we certainly need. Let me figure it out and propose something.

Alexdavv · June 16, 2020, 2:32pm

You haven’t tagged me but let me comment on this.

Doesn’t make any sense. The mentioned LOINC is not a serological test. It’s the same test that you have in the 2nd record. Can you add more examples, please? The point is to link all the tests performed on the specimen with the specimen, right?

Right, but this is a Vocabulary issue. Domains are not clean. As long as that’s true, people here and there ignore this convention.

FACT_RELATIONSHIP is a kind of retired, isn’t it? Modifier_of_event_id / observation_event_id are much more progressive, but if it comes to the 6.1 I’d recommend the following:

introduce specimen_id field to the Measurement table (and possibly, Observation table - unless Domains are clean) as a foreign key. The same as we do for provider, visit, visit_detail, care_site, etc. This solution would be much more natural. The tools would be upgraded by the introduction of the same scenarios that are already implemented.
The only thing you’d need to link with your approach is a Drug susceptibility test to Bacterial Culture. But there were discussions around the creation of precoordinated: microorganisms x antimicrobials x susceptibility testing methods. @Christian_Reich is it still a plan?

And a separate question is how methods are being prepared to support Modifier_of_event_id / observation_event_id linking. Can somebody please update?

MPhilofsky · June 18, 2020, 5:10pm

I don’t think you should store record #1 in the Specimen table

because the data is inherent in

RE: Measurement.value_as_concept_id
I would like to petition the CDM working group to include more domain_ids allowed in Measurement.value_as_concept_id because the current convention,

Is too restrictive.

If allowed, the staphylococcus identified in the 4th record would live in the Measurement.value_as_concept_id field of record #2. Eliminating the need for a separate 4th record. AND, IMHO, this is the most logical representation of the data. The Measurement record is a blood culture test (measurement_concept_id) and the result of the test is it grew Staph (value_as_concept_id).

The current structure of the CDM would require the susceptibility record to be a separate Measurement record as you described:

This topic has been around over 5 years. Clear conventions would be helpful to make the data available for a network query.

Dymshyts · June 23, 2020, 4:39pm

Normally, Meas Value is the result of some measurement by design (Qualifier Value of SNOMED or Answers of LOINC), but here’s the class of Organisms.
It’s hard to imagine the cases when organisms could be used in OMOP CDM except of results of bacterial culture, but still domain change should be thought through.

If we want to keep this Meas Value constaint, @Alexdavv please explain what is the usefulness of this restriction by Meas Value domain.
Shouldn’t it be said that it is recommended to use Meas Value, but other domains are allowed if needed?

What if we introduce specimen_concept_id instead?

Here’s another related topic

Alexdavv · July 2, 2020, 2:53pm

Agree. A simple rule might be applied: everything that doesn’t make sense to be stored in the event_as_concept_id field, might go to the Meas Value Domain. This is a good constraint that prevents the creation of such events, actually. There are probably a few exceptions. I think organisms are not among them.

Not very useful. But once Domains are adjusted as described above, you’ll not need to violate this restriction so often. All the cases when we put Procedures/Conditions/Devices in the value_as_concept_id field go to the Observation table, agree? The only exception that comes to my mind is the Drugs recorded as values with some Measurements having a nominal scale, e.g. Drugs identified in Gastric fluid by Screen method.

Currently, we store the specimen data in the specimen table as well as it’s a piece of the Measurement concepts semantics. If we add specimen_concept_id, the specimen table together with all the included data should be retired then?

parisni · August 27, 2020, 10:00am

I am in the process to integrate microbiology too, and I haven´t found any news/choices about this question. From several reading below I think @TBanokina proposal is a good move and I took it as a basis with some slight modifications:

added specimen_id to measurement table as a clear foreign key to specimen
used Modifier_of_event_id / Modifier_of_field_concept_id to store the relation between records (susceptibility derive from of culture)
I am thinking about a microbiology_era table which could synthesize the information and ease the analysis (related to https://github.com/OHDSI/CommonDataModel/issues/281)

This has the benefit to not use the fact relationship table. Still the self joins on measurement to link the culture is a performance problem as mentioned by @cukarthik. I also wonder if a dedicated culture table is not a better choice.

MPhilofsky · September 15, 2020, 3:57pm

+1 for expanding the domain constraint for MEASUREMENT.value_as_concept_id

Let’s open this up, so when the use cases arise, the data is already mapped to standard concepts in the cdm. US EHR data usually has 100,000+ custom codes for the value_as_concept_id field. This field is rarely standardized at the source.

@Christian_Reich, @clairblacketer Should we add this to the conventions

GanselX · December 3, 2020, 3:52pm

Hi,
We are also in the process of mapping laboratory microbiology data onto OMOP CDC v6.0 and have encountered similar issues as the ones discussed here and is other threads. Our use case may be summarized saying that we aim at detailed observing tests performed in a lab network.
As shown below this relates to a several discussion threads for which we have not seen a firm answer yet.
In the situation of micro-organism identification and subsequent antibiotic susceptibilities, our model raises questions similar to what is described by @parisni, @TBanokina and @nzvyagina
It includes a specimen table (eg. Veinous Blood sample) that is linked by fact_relationship to one or more measurement table.

It includes a specimen table (eg. Veinous Blood sample) that is linked by fact_relationship to one or more measurement table.
A first measurement table stores information about the microorganisms identification tests and system level results. Such as, in the case of a mass spectroscopy malti-tof test,
measurement_concept_id = OMOP code for LOINC describing the test eg. 76346-6 Microorganism identified in Isolate by MS.MALDI-TOF
Value_as_concept_id = OMOP code for the concept representing the organism identified
This table is linked by fact_relationship to several measurement tables for antibiotic susceptibility tests & results.
Measurement table storing antibiotic susceptibility results.
measurement_concept_id = OMOP code for the LOINC describing the antibiotic test, eg. 28-1 Ampicillin [Susceptibility] by Minimum inhibitory concentration (MIC)
Value_as_number would be the MIC result
Value_as_concept_id = OMOP code for the concept representing the corresponding category (R/I/S)

This leads to several questions appearing in several discussion threads for which we seek answers

How can we confirm that using both value_as_concept_id and value_as_number to store antibiotic susceptibility results (MIC & category) for the same measurement is allowed ?
This is also

discussed in Measurement Table - are value_as_concept_id and value_as_number mutually exclusive?
suggested by the google documents shared from the CDM + Themis Working Group for the Measurement table (https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:cdm-wg#objectives).

We have the same issue than @TBanokina concerning Meas.value domain

It is especially challenging to store organisms identification in value_as_concept_id (eg. Escherichia coli) because SNOMED derived codes for organisms are considered as Observation domain (ID = 4011683). To follow the generally accepted paradigm of “LOINC as the question and SNOMED as the answer”, one would need to allow code from SNOMED vocabulary to be also used as value_as_concept_id
So, how can we proceed to modify and clear this vocabulary issue, thus allowing SNOMED vocabulary to be used for value_as_concept_id in the MEASURMENT table?
Note that a related discussion appears in the thread from @Vojtech_Huser (Duplicate standard concepts for value_as_concept_id in OMOP Vocab) showing that data duplication appear between LOINC answers and SNOMED codes.

We agree with @Christian_Reich, connecting specimen and measurement by the fact relationship is not the cleanest of all options. As @parisni and @Alexdavv suggested (see also Link between SPECIMEN and MEASUREMENT) adding specimen_id as a foreign key to the Measurement table would reduce the need for fact_relationship between those two table (Modifier_of_event_id / Modifier_of_field_concept_id)
How can we help the community to move forward in this direction and make a decision ?
Lastly what is the status of the proposal for a microbiology or culture area table ? (https://github.com/OHDSI/CommonDataModel/issues/281 and Adding Cultures into OMOP v5)
Is this still a living proposal ? how can we participate to its improvement (if needed) and blessing ?

Considering that options 3 and 4 are likely mutually exclusive.
E. Theron // X. Gansel