When I last checked, I couldn’t find a domain of concepts that are specifically for the relationship_concept_id in fact_relationship. (The ‘Relationship’ domain is used to associate people to family members). What I interpreted this to mean was someone would find some standard concept with the right words in it, and throw that into the relationship_concept_id, and they could query their own data and find the relationships, but another CDM might have chosen some other concept with similar words for their own relationship. Or, maybe they thought of it as a reverse where you say A -> B and they say B <- A, which has the same logical meaning but phrased different. Enter Themis to set up conventions. But these are conventions we don’t have today, so if you do it for yourself, it is ugly.
But from a CIRCE perspective, it’s a fact of life that the CDM supports the notion of fact_relationship, so it is currently a gap in the query capability. The main reason it hasn’t been implemented is that the underlying infrastructure to support this has not be formalized in a standardized way (similar to the lack of support for querying payer_plan_period…until recently, there were no standard concepts in there).
Ok, here goes:
We want to support the functionality that one criteria can be related to another criteria via fact relationship. Here’s a sample JSON that would capture that association:
-- Begin Measurement Criteria
select C.person_id, C.measurement_id as event_id, C.measurement_date as start_date, DATEADD(d,1,C.measurement_date) as END_DATE, C.measurement_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id
from
(
select m.*, row_number() over (PARTITION BY m.person_id ORDER BY m.measurement_date, m.measurement_id) as ordinal
FROM @cdm_database_schema.MEASUREMENT m
where m.measurement_concept_id in (SELECT concept_id from #Codesets where codeset_id = 0)
) C
JOIN @cdm_database_schema.PERSON P on C.person_id = P.person_id
JOIN @cdm_database_schema.FACT_RELATIONSHIP FR on FR.fact_id_1 = C.measurement_id and FR.relationship_concept_id = 123456 and FR.domain_id_1 = 'Measurement' and FR.domain_id_2 = 'Measurement'
WHERE YEAR(C.measurement_date) - P.year_of_birth > 18
AND FR.fact_id_2 in (
select TF.event_id from (
-- Begin Measurement Criteria
select C.person_id, C.measurement_id as event_id, C.measurement_date as start_date, DATEADD(d,1,C.measurement_date) as END_DATE, C.measurement_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id
from
(
select m.*, row_number() over (PARTITION BY m.person_id ORDER BY m.measurement_date, m.measurement_id) as ordinal
FROM @cdm_database_schema.MEASUREMENT m
where m.measurement_concept_id in (SELECT concept_id from #Codesets where codeset_id = 1)
) C
-- End Measurement Criteria
) TF // target fact
)
-- End Measurement Criteria
So the idea is to join to fact_relationship on the event_id of the A record, and then do an IN query to find any measurement that matches the criteria’s event_id that is the fact_relationship’s fact_id_2.
We use an ‘in’ instead of a join, because we don’t want to select additional records in the outer measurement criteria (although there is probably a way to get around this by using a join + group by so that just the record from the first measurement record is returned.
Note this also supports nesting, so that you could look for record A --relates to --> B — relates to —> C.
since fact_ids could collide (they are record_ids from the specific table) you want to make sure that the fact’s domain matches the table that you want to join with. So I should have added that to the response to @Christian_Reich, that when I joined on fact relationship, i should also make sure that the domain_concept_id_1 == ‘Measurement’ and domain_concept_id_2 == ‘Measurement’.
I see, if I understand correctly - the concern with the FACT_RELATIONAHIP table is not the intent/purpose of the table, or the efficiency of using the table in SQL; the concern is around the absence of a domain for the relationship_concept_id.
The condition_occurrence table has a field called condition_status_concept_id . It does not have a domain and we are asked to use concept_id’s from condition domain. (See bottom of https://github.com/OHDSI/CommonDataModel/wiki/CONDITION_OCCURRENCE). This example is rules by convention, not vocabulary (I blame @cgreich for this j/k)
So, the cdm vocabulary work group needs to propose a new domain and create concept_id’s?
In the case with cultures, you’d want an organism in the observation domain linked to the antibiotic susceptibility in the measurement domain. I’m not sure I understand the collision issue…
Fine. So, yes, we could do FACT_RELATIONSHIP. I still like something modeled out properly more. Because this we will have to nail with a water-tight THEMIS convention.
Measure to measure, there’d be no collision because of same-domain record_id.
But if you wanted to relate an observation to a diagnosis, or a procedure to a drug, you could have a dataset that has 100 rows (numbers 1…100) in each of the domain tables.
If I say there’s a fact from row 7 to row 9, without knowing which domain the row 7 relates to, and which domain the row 9 relates to, you could match any domain’s row #7 to any other domain’s row to #9. That’s the collision, and the domain_id resolves the collision.
Agreed. fact_relationship seems to exist to model any type of relationship imaginable. Just like an EAV lets you model any entity imaginable, all without changing your schema. But I like schemas, I like the structure they enforce. If there’s something missing from the CDM, I’d much rather see a new tables and xref tables created to specifically join the data together vs. the addition of new records into fact_relationship.
.
Yes, apologies, I said domain_id in the prior message, but the actual column is domain_concept_id_1 and _2. So yes, that is what resolves the conflict.
For me, the balance has to do with how simple the relationship is (1:1s are cleaner than aiming for a hierarchy via 1:Ns and self-joins) and how often we want to use it. For the culture case, it seemed easier to model the 1:N culture:isolate relationship with a new table measurement_organism, but that is use-case-specific. It could be generalized to a measure_attribute sort of table that handled one level of hierarchy if there were additional use cases (e.g. multi-gene panels) where it’s important to maintain the connection. There’s another 1:N from bug to drug, which we don’t yet model in PEDSnet mostly because the source data is too messy to standardize at this point. But we’re going to need it at some point . . .
I do agree that if OHDSI is going to model specimen:bug:drug via fact_relationship, all three facts belong in measurement. Some have categorical values (especially the bug), but there’s a defined normal value (no bug), and the assay shares control characteristics with other lab tests.
@bailey I like the suggestion of a measure_attribute table. Another use case with the same challenge is ECG data. @mkwong is working on an ECG vocabulary with help from @rtmill and myself that he will be proposing to the CDM workgroup. We’ve been having a very similar discussion to the one in this thread. Since there are multiple use cases with this form in common, a general solution either in the form a dedicated table or a standard approach to using relationships is worth considering rather than an organism/culture-specific solution.
Any reason there is no visit links in the specimen table ? That d’be reasonable to add both because all laboratory software I worked with do link the specimens to at least the visit_occurrence granularity.
Do we think we are ready to come up with a proposal for the CDM WG? Or a couple of alternatives for both issues (specimen in visit and linking specimens to test results)?
Sorry. I lost track of this discussion even though I started it
I would be happy to put together a proposal. I agree it would be good to combine efforts with @Andrew and other to see if we can come up with a generic solution for cultures as well as ECGs
@mkwong and @Andrew do you have a draft we take a look at or maybe we can set up some time to discuss offline?
Hi @mkwong and @Andrew I just wanted to follow up to see if you guys want to discuss a measure_attribute table. I think this would be the best approach.
Not as a final recommendation, but just to provide an inadequate straw person, here’s what we currently use for measure_organism (apologies for format; I can’t figure out tables in the forums). Basically just a specialized attribute with links back to most-used entities.
1.15 MEASUREMENT_ORGANISM
The measurement organism table contains organism information related to laboratory culture results in the measurement table. This table is CUSTOM to Pedsnet.
Note 1: There can be multiple organisms for a single culture laboratory result.
Field
NOT Null Constraint
Network Requirement
Data Type
Description
PEDSnet Conventions
meas_organism_id
Yes
Yes
Integer
A system-generated unique identifier for each organism culture relationship.
This is not a value found in the EHR. Sites may choose to use a sequential value for this field.
measurement_id
Yes
Yes
Integer
A foreign key to the lab result in the measurement table where the organism was observed.
person_id
Yes
Yes
Integer
A foreign key identifier to the person who the measurement is being documented for. The demographic details of that person are stored in the person table.
visit_occurrence_id
No
Provide When Available
Integer
A foreign key to the visit where the culture lab was ordered
organism_concept_id
Yes
Yes
Integer
A foreign key to a standard concept identifier for the organism in the Vocabulary.
Please include valid concept ids (consistent with OMOP CDMv5). Predefined value set (valid concept_ids found in CONCEPT table where vocabulary_id = SNOMED and concept_class_id= Organism and standard_concept=S)
select * from concept where vocabulary_id =‘SNOMED’ and concept_class_id=‘Organism’ and standard_concept=‘S’ yields 33039 valid concept_ids.
organism_source_value
Yes
Yes
Varchar
The organism value as it appears in the source.
positivity_datetime
No
Optional
Datetime
The estimated date and time of initial growth as reported in the source.
If a field marked as “Provide when available” for the network requirement is not available at your site, please relay this information to the DCC
1.15.1 Additional Notes
The time to positivity field is marked as optional. Please inform the DCC in the provenance files if this data is available at your site.
Hi @cukarthik, I’m so sorry for missing this when you posted it. I was in transition to a new institution and my attention to most things OHDSI flagged for a few weeks. I will ping @mkwong, @rtmill, @bailey and @Rijnbeek about this to get their input on pushing it forward.
Just wanted to let people know on this thread that we’ve created a proposal to handle cultures and that might also handle other clinical results. You can see the post here.