To pre-coordinatate or not to pre-coordinate in MEASUREMENT table

Vojtech_Huser · April 10, 2020, 4:33pm

I prefer tight specification when in comes to measurement table.
For example for HIV testing or COVID testing, if I have a test that detects an infection, I prefer to know which test was it that made me ‘negative’ for HIV or COVID.

This broad definition for COVID uses a precoordinated term

This term Athena

Use of pre-coordinated test should be discouraged in cases where a LOINC test and standardized value_as_concept_id exists to represent the same information.

It is a big difference to not detect using RNA test vs antibody test. And precoordinated negative terms don’t distinguish that.

Chris_Knoll · April 10, 2020, 5:59pm

I think there’s persistent confusion/conflation between the test that is performed and the result the test produces.

If it were up to me, I think the test would be put into the procedure domai (the procedure table), and the result of the measurement is stored in the measurement table. It would be usefull to know which measurement came from which procedure, so a refrence to the procedure_ocurrence_id would be added to the measurement table. I suggest measuremnet-> procedure because a single procedure (like a panel) may result in multiple measured resutls, but we can assume that a single measure came from one procedure.

Alexdavv · April 10, 2020, 8:24pm

This is not a pre-coordination issue, but a data granularity issue. ETL is free to map to all the variety of specific LOINC / SNOMED tests. A researcher is free to exclude the generic Measurement concepts from the analysis.

The only reason such pre-coordinated concepts exists is the SNOMED, that created them. Then we placed them to the appropriate place in the hierarchy.

But, when the source data is not granular enough and ETL basically doesn’t know what testing method was, it might (and probably should) be mapped to generic concepts with no matter how value is stored (separately or in a pre-coordinated way). SNOMED normally provides only pre-coordinated options, but for COVID we did generic: Measurement of severe acute respiratory syndrome coronavirus 2.
The reason was: to organize the COVID concepts in the hierarchy and to provide a way to store the results of the unknown testing method.

I think the possibility of choice is better than wrong mapping that usually happens. Let’s say HIV negative or Finding of HIV status might be wrongly used for this purpose on the reason that user doesn’t even realise that both stand for antibody testing.

This concept-set should only include 2019 novel coronavirus detected concept. Can you please check?

The reasons why we used this concept in both narrow and broad definitions during the study-a-thon were uncertainty about how data was coded and ETLed on that point together with unavailability of Ab testing and low availability of antigen tests. I’d agree that narrow definition should be reassessed during the time and should include just most reliable testing methods (PCR and, possibly, Ab).

2 points:

Mapping should be such granular as possible (as it always supposed to be);
We have a bunch of pre-coordinated concepts in SNOMED. Just look into the descendants of Virus present, Virus not detected, Bacteria present, etc. One of the possible solutions is to remap and split them to generic Measurement concept (that are simply doesn’t exist yet, except for COVID ) + maps to value. As far as I know, the only reason that stopped us is the unavailability of analytical methods to use the concept_id/value_as_concept_id combinations as covariates.

The same might be applied in the other Domains:

Allergy to substance + allergen vs Allergy to substance descendants (maybe even more complicated by allergy recording for group of substances: Allergy against penicillins).
History of clinical finding in subject + Condition vs History of clinical finding in subject descendants.
Past history of procedure + Procedure vs H/O: surgery and descendants.
Family history of clinical finding.

@Vojtech_Huser @Christian_Reich, @Dymshyts, @mik, @aostropolets your thoughts, please.

aostropolets · April 10, 2020, 9:19pm

My short answer here will be
a) I agree with @Vojtech_Huser that we always want a single consistent solution: either pre-coordinated or post-coordinated. In reality, it doesn’t always work out nicely. Converting one class into another requires continuous effort to detect those, remap and maintain with each refresh of a participating vocabulary.
It is not feasible for a large massive of pre-coordinated concepts. It is worth remapping for some of the cases. For example, we encourage a sort of post-coordination in allergies, histories, microbiology data, just as @Alexdavv said.
b) For this specific example, LOINC may be too granular for source data, which requires a generic concept with no test method mentioned. But whenever you have more specific data, go for LOINC.
The same principle applies to other SNOMED measurements.

Vojtech_Huser · April 10, 2020, 9:42pm

ETL is free to map to all the variety of specific LOINC / SNOMED tests

Absolutely not ! This is an antithesis of COMMON data model. The goal of ELT guidance for sites with COVID data is precisely to make the world unite on some useful conventions. It is OK to have broad definitions (for the sake of good study), but accompanied by some “preferred” ETL recomendations for week 4 and months 2 and 3 of the epidemic.

Phenotype studyathon subgroup is planing to mention those preferred options. See this thread here

Alexdavv · April 10, 2020, 10:25pm

I mean that all the specific concepts needed are provided. No need to map to the generic ones even if such exist.
And this is a task of ETL to map to the most precise and granular one. Instructions and ETL guidance are always appreciated.

Vojtech_Huser · April 10, 2020, 10:40pm

the only reason that stopped us is the unavailability of analytical methods to use the concept_id/value_as_concept_id combinations as covariates.

I think this is legacy of the claim-heavy start of OMOP CDM. In the long run, we should strive to improve the analytical methods to follow a neatly organized CDM. If methods can’t handle test+value combination, a “view of the data” can pre-populate extra “helper rows” that pre-coordinate some key rows prior to the analysis with methods that better operate when pre-coordinated. (until the methods can catch up). But we should stick with a neat CDM.

Christian_Reich · April 11, 2020, 4:54pm

Friends:

This is a wonderful debate. And you all have a point, because:

Precoordinated concepts have the advantages of being placed into the hierarchy, but only if they are positive. “Coronavirus test negative” should not be the child of “coronavirus test”. Those are typically conditions, and we have a few of those.
Post-coordinated have the advantage that you can combine tests and many values without getting into a permutational explosion. That’s why we did that in Measurement, because it is impossible to precoordinate tests with all their possible results.

Now what should we do?

Allowing both doesn’t really help, because we double the side effects (explosion for the precoordinated and no hierarchy for the postcoordinated). So, we should pick, but if we did that the vocabulary system actually would have no way of converting one into the other. Right now, we can only split concepts into precoordinated concepts through “Maps to” and “Maps to value”. We cannot combine, and we certainly cannot deal with numerical results. And it would be a big job.

What did we do so far?

Only a little bit half-assed. Because of the above reasons. We have a vocabulary backlog item where we want to create a new vocabulary structure addressing this. @rimma is pushing it for tumor attribute processing, but we need it in all of these cases.

rimma · April 12, 2020, 6:26am

@Christian_Reich, I am pushing this approach for the entire vocabulary. “Free” usage of pre-coordinated concepts for source data mappings comes at a high cost for analytics. Especially, in the absence of good handling/checks for consistency of relationships and classes for pre-coordinated concepts in the vocabulary.

Therefore and until we establish consistent handling of pre-coordinated concepts in the vocabulary, I would minimize and discourage use of pre-coordinated concepts via conventions. Especially, in the example of @Vojtech_Huser, where there is a perfect possibility to post-coordinate using CDM. In his example, COVID test concept should be used as measurement_concept_id and Positive/Negative/etc as value_as_concept_id. A little extra effort on the ETL end for data pre-coordinated in the source would streamline representation and analytics tremendously.

MaximMoinat · April 17, 2020, 10:56am

One of the proposed oncology cdm extensions can be used exactly for that. This extension adds two fields, modifier_event_id and modifier_of_field_concept_id, that can link the measurement to another record. In the oncology use cases the main use is to link condition modifiers like tumour measurements and grades.

MPhilofsky · April 17, 2020, 3:02pm

Yes, we put the test in the Procedure table, it might be the standard concept_id for CPT4 code for a CBC blood draw. And the results are in the Measurement table, with standard concept_ids for LOINC codes for each component of the CBC blood draw. If you have a use case for the data, then the data should be added.

We already have the connection in the CDM, it’s the beloved Fact Relationship table
Some sites connect the Procedure to the Measurements via the Fact Relationship table. Others don’t bother with the Fact Relationship table because Atlas doesn’t use it. If you’re writing SQL directly against the CDM, the Fact Relationship table is very useful for connecting all the facts if you have a use case. I have seen the Fact Relationship table used to connect Medication Orders, Medication Administrations, and Medication Dispenses; Systolic and Diastolic BP Measurements; Notes to everything, etc.

I hadn’t thought to use the proposed fields for this. The name of the field suggests one field “modifies” the other, which is not what is being done with the Procedure-Measurement, Drug-Drug, Measurement-Note, and other linkages. Will this field be available in all the CDM tables? And more importantly, will Atlas use it?

Chris_Knoll · April 19, 2020, 1:29am

If we follow this logic to the use-case I described, wouldn’t it then make sense to drop all visit_occurrence_ids from the other domain tables and use fact_relationship to define the connection between those observations and the visit?