Adding Cultures into OMOP v5

Christian_Reich · November 16, 2017, 10:40am

You said it all, @Chris_Knoll.

Still not enough granularity, @Gowtham_Rao. There is no one-to-one relationship between blood cultures and visits. If you are in the ICU with a systemic infection, you’ll get blood cultures and antibiograms all the time. While not having moved an inch, and therefore not having created a new VISIT_DETAIL.

Well, let’s think about it. In this case you need to link two full-fledged criteria:

Measurement “blood culture” with value as concept “Staphylococcus”
Measurement “antibiotic sensitivity” with value as concept “resistant”.

You can’t do that in a single criterion, can you?

And thank you @vmeau for all your kind words each time somebody says something.

vmeau · November 16, 2017, 10:51am

Hello,

Thank you for your email regarding [OHDSI Forums] [CDM Builders] Adding
Cultures into OMOP v5.

I am currently at a training and I have limited email access. I will respond
to your e-mail upon my return on Friday, November 17th.

Thanks for your patience,
Valerie.

Gowtham_Rao · November 16, 2017, 12:45pm

I thought the hate from @Christian_Reich for the fact_relationship table was because the SQL was getting “dirty” and confusing.

I want to understand this more @Chris_Knoll, because your concern is more around the ambiguity the fact_relationship table introduces. I am sure you are right, but I dont understand why the relationship wont hold the same meaning on another CDM, when the relationship is governed by the vocabulary.

Yes - definitely.

However, I think there is one draw back. What is domain_concept_id_1 and domain_concept_id_2 in Fact_relationship? If we want to relate the pk visit_detail_id of visit_detail to pk visit_occurrence_id of visit_occurrence – how do we do it?

vmeau · November 16, 2017, 12:51pm

Hello,

Thank you for your email regarding [OHDSI Forums] [CDM Builders] Adding
Cultures into OMOP v5.

I am currently at a training and I have limited email access. I will respond
to your e-mail upon my return on Friday, November 17th.

Thanks for your patience,
Valerie.

Chris_Knoll · November 16, 2017, 3:56pm

When I last checked, I couldn’t find a domain of concepts that are specifically for the relationship_concept_id in fact_relationship. (The ‘Relationship’ domain is used to associate people to family members). What I interpreted this to mean was someone would find some standard concept with the right words in it, and throw that into the relationship_concept_id, and they could query their own data and find the relationships, but another CDM might have chosen some other concept with similar words for their own relationship. Or, maybe they thought of it as a reverse where you say A → B and they say B ← A, which has the same logical meaning but phrased different. Enter Themis to set up conventions. But these are conventions we don’t have today, so if you do it for yourself, it is ugly.

But from a CIRCE perspective, it’s a fact of life that the CDM supports the notion of fact_relationship, so it is currently a gap in the query capability. The main reason it hasn’t been implemented is that the underlying infrastructure to support this has not be formalized in a standardized way (similar to the lack of support for querying payer_plan_period…until recently, there were no standard concepts in there).

Chris_Knoll · November 16, 2017, 4:27pm

Ok, here goes:
We want to support the functionality that one criteria can be related to another criteria via fact relationship. Here’s a sample JSON that would capture that association:

    "CriteriaList": [
      {
        "Measurement": {
          "CodesetId": 0,
          "Age": {
            "Value": 18,
            "Op": "gt"
          }
          "FactRelationship": {
                RelationshipType: 1234567, // a concept for 'Measurement Finding'?
                Criteria: {
                  "Measurement": {
                  "CodesetId": 1
                }
          }
      }
    ]

Here’s the relevant part of the cohort sql:

  -- Begin Measurement Criteria
select C.person_id, C.measurement_id as event_id, C.measurement_date as start_date, DATEADD(d,1,C.measurement_date) as END_DATE, C.measurement_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id
from 
(
  select m.*, row_number() over (PARTITION BY m.person_id ORDER BY m.measurement_date, m.measurement_id) as ordinal
  FROM @cdm_database_schema.MEASUREMENT m
where m.measurement_concept_id in (SELECT concept_id from  #Codesets where codeset_id = 0)
) C
JOIN @cdm_database_schema.PERSON P on C.person_id = P.person_id
JOIN @cdm_database_schema.FACT_RELATIONSHIP FR on FR.fact_id_1 = C.measurement_id and FR.relationship_concept_id = 123456 and FR.domain_id_1 = 'Measurement' and FR.domain_id_2 = 'Measurement'
WHERE YEAR(C.measurement_date) - P.year_of_birth > 18
AND FR.fact_id_2 in ( 
  select TF.event_id from (
    -- Begin Measurement Criteria
    select C.person_id, C.measurement_id as event_id, C.measurement_date as start_date, DATEADD(d,1,C.measurement_date) as END_DATE, C.measurement_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id
    from 
    (
      select m.*, row_number() over (PARTITION BY m.person_id ORDER BY m.measurement_date, m.measurement_id) as ordinal
      FROM @cdm_database_schema.MEASUREMENT m
      where m.measurement_concept_id in (SELECT concept_id from  #Codesets where codeset_id = 1)
    ) C
    -- End Measurement Criteria
  ) TF // target fact
)
-- End Measurement Criteria

So the idea is to join to fact_relationship on the event_id of the A record, and then do an IN query to find any measurement that matches the criteria’s event_id that is the fact_relationship’s fact_id_2.

We use an ‘in’ instead of a join, because we don’t want to select additional records in the outer measurement criteria (although there is probably a way to get around this by using a join + group by so that just the record from the first measurement record is returned.

Note this also supports nesting, so that you could look for record A --relates to → B — relates to —> C.

Chris_Knoll · November 16, 2017, 4:34pm

since fact_ids could collide (they are record_ids from the specific table) you want to make sure that the fact’s domain matches the table that you want to join with. So I should have added that to the response to @Christian_Reich, that when I joined on fact relationship, i should also make sure that the domain_concept_id_1 == ‘Measurement’ and domain_concept_id_2 == ‘Measurement’.

…I’ll go back and make that edit.

Gowtham_Rao · November 16, 2017, 5:11pm

I see, if I understand correctly - the concern with the FACT_RELATIONAHIP table is not the intent/purpose of the table, or the efficiency of using the table in SQL; the concern is around the absence of a domain for the relationship_concept_id.

The condition_occurrence table has a field called condition_status_concept_id . It does not have a domain and we are asked to use concept_id’s from condition domain. (See bottom of https://github.com/OHDSI/CommonDataModel/wiki/CONDITION_OCCURRENCE). This example is rules by convention, not vocabulary (I blame @cgreich for this j/k)

So, the cdm vocabulary work group needs to propose a new domain and create concept_id’s?

cukarthik · November 16, 2017, 5:13pm

In the case with cultures, you’d want an organism in the observation domain linked to the antibiotic susceptibility in the measurement domain. I’m not sure I understand the collision issue…

Christian_Reich · November 16, 2017, 5:33pm

Both are measurements.

Christian_Reich · November 16, 2017, 5:36pm

Fine. So, yes, we could do FACT_RELATIONSHIP. I still like something modeled out properly more. Because this we will have to nail with a water-tight THEMIS convention.

Chris_Knoll · November 16, 2017, 6:05pm

Measure to measure, there’d be no collision because of same-domain record_id.

But if you wanted to relate an observation to a diagnosis, or a procedure to a drug, you could have a dataset that has 100 rows (numbers 1…100) in each of the domain tables.

If I say there’s a fact from row 7 to row 9, without knowing which domain the row 7 relates to, and which domain the row 9 relates to, you could match any domain’s row #7 to any other domain’s row to #9. That’s the collision, and the domain_id resolves the collision.

Chris_Knoll · November 16, 2017, 6:13pm

Agreed. fact_relationship seems to exist to model any type of relationship imaginable. Just like an EAV lets you model any entity imaginable, all without changing your schema. But I like schemas, I like the structure they enforce. If there’s something missing from the CDM, I’d much rather see a new tables and xref tables created to specifically join the data together vs. the addition of new records into fact_relationship.
.

cukarthik · November 16, 2017, 8:33pm

Wouldn’t domain_concept_id_1 and domain_concept_id_2 in the current fact_relationship table already address this issue?

Chris_Knoll · November 16, 2017, 10:51pm

Yes, apologies, I said domain_id in the prior message, but the actual column is domain_concept_id_1 and _2. So yes, that is what resolves the conflict.

bailey · November 16, 2017, 11:20pm

For me, the balance has to do with how simple the relationship is (1:1s are cleaner than aiming for a hierarchy via 1:Ns and self-joins) and how often we want to use it. For the culture case, it seemed easier to model the 1:N culture:isolate relationship with a new table measurement_organism, but that is use-case-specific. It could be generalized to a measure_attribute sort of table that handled one level of hierarchy if there were additional use cases (e.g. multi-gene panels) where it’s important to maintain the connection. There’s another 1:N from bug to drug, which we don’t yet model in PEDSnet mostly because the source data is too messy to standardize at this point. But we’re going to need it at some point . . .

I do agree that if OHDSI is going to model specimen:bug:drug via fact_relationship, all three facts belong in measurement. Some have categorical values (especially the bug), but there’s a defined normal value (no bug), and the assay shares control characteristics with other lab tests.

Andrew · November 17, 2017, 2:30pm

@bailey I like the suggestion of a measure_attribute table. Another use case with the same challenge is ECG data. @mkwong is working on an ECG vocabulary with help from @rtmill and myself that he will be proposing to the CDM workgroup. We’ve been having a very similar discussion to the one in this thread. Since there are multiple use cases with this form in common, a general solution either in the form a dedicated table or a standard approach to using relationships is worth considering rather than an organism/culture-specific solution.

parisni · December 11, 2017, 10:27am

Hi all.

Any reason there is no visit links in the specimen table ? That d’be reasonable to add both because all laboratory software I worked with do link the specimens to at least the visit_occurrence granularity.

Christian_Reich · December 14, 2017, 5:14pm

Friends:

Do we think we are ready to come up with a proposal for the CDM WG? Or a couple of alternatives for both issues (specimen in visit and linking specimens to test results)?

cukarthik · December 19, 2017, 3:36am

Sorry. I lost track of this discussion even though I started it

I would be happy to put together a proposal. I agree it would be good to combine efforts with @Andrew and other to see if we can come up with a generic solution for cultures as well as ECGs

@mkwong and @Andrew do you have a draft we take a look at or maybe we can set up some time to discuss offline?