Suspected Diagnosis and its place in the OMOP CDM

LeileifromUK · July 5, 2020, 8:16pm

Dear all,

this is a perfect example of the diagnosis evolvment process and Bayes’ theorem during clinical decision making. We almost always start off with a “working diagnosis”, aka suspected. Then Bayes’ theory coming into play. The more evidence I have about htat patinent, e.g. lab best, imaging film etc, I either increase the probabliy or decrease the probability of “my working diagnosis”. therefore “Diganosis” is contantly changing.

@robyn.rubin “Examinatoin for suspected COVID-19” is a working diagnosis. That diagnosis should have been updated after the test result comes up. this working diagnosis is going to be changed to either “COVID-19 confirmed by COVID-19 confirmed by laboratory test” or “COVID-19 confirmed using clinical diagnostic criteria”. This may not always be possible as clinicans may not always change it promptly. Or, patients might be transferred elsewhere.

If we merely want to create a research database, then to me, store the data as its native form is important. @Alexander Davydov Disease suspected hierarchy is not going to help you because SNOMED CT is not going to author “suspected” concept for every condition.

in addition, FHIR has clearly stipulates that Diagnosis verification status is a mandatory messaging standard.
e.g. Provisional (or Suspected) is one of them and the definition is: This is a tentative diagnosis - still a candidate that is under consideration. (https://www.hl7.org/fhir/valueset-condition-ver-status.html)

I would strongly suggest your organisation work towards being able to store diagnosis verification status (discreetly). the analytical tool shouldn’t be a problem if we use the SNOMED CT expressions.

Christian_Reich · July 5, 2020, 9:05pm

So, the whole debate is because we want to revise the Condition Status Concepts, and the question is should there be a Status Concept “Suspected” or not. Has nothing to do with referral. Has to do with the fact that during care the diagnosis starts with a suspicion and beomes more and more refined, as @LeileifromUK pointed out. Where do we draw the line and make the assumption that the diagnosis is good enough to be a fact in the CONDITION_OCCURRENCE table?

Yes. That would be a good one too. Why Measurement and not Observation?

We usually don’t have that clout, @LeileifromUK. We get the data collected for primary purposes. Our job is to represent it in such a way that we all agree what it means, since this is a Common Data Model. So, if a data asset has suspected diagnoses we need to come to a consensus to where to put it and how to represent it.

Chris_Knoll · July 6, 2020, 4:40am

Observation was one of the choices, I don’t have a bias one way or another, but if you consider a hypothesis something ‘measured’ maybe it fits into measurement, otherwise observation makes sense.

LeileifromUK · July 6, 2020, 9:10am

If this is the case, I don’t think we have an option but to fully represent the suspected verification status as a discreet field. Only in this way, we can maintain coverage and consistency, never mind drive the future direction for travel to be inline with FHIR.
For the EHR system that configured this as a discreet field, it is straightforward and we can model this using the qualifier value. For the EHR system which hasn’t got this field configured as a discreet field, we can use the SNOMED CT modelling to derive the attribute that insinuating “suspected” condition and auto populate in the table. The two major attributes are: Medical examination for suspected condition (procedure) and Finding context (attribute) - Suspected (qualifier value)
Reality is, if the clinician entered a condition “suspected xxx”, they shouldn’t have to enter “suspected” again in the diagnosis verification field to avoid duplication. We should be able to use SNOMED CT modelling tools to derive that element at the backend.

This is how we handle it. Happy to explain further if needed, unless I completely misunderstand the whole discussion topic! (-:

robyn.rubin · July 7, 2020, 5:44am

I see your point and agree with you, however this is the reason for which the patient was referred to the hospital, that was my thinking with regards to condition.

LeileifromUK · July 7, 2020, 2:59pm

In the EHR system, there is normally a field (or several fields) called Health issues or problems or visit diagnosis which the clinicans are expected to enter the information in a structured way. How each field is configured differs from system to system. The content underpins these fields also differs. There is nothing wrong to allow “Examinatoin for suspected COVID-19” to be entered. our Trust just didnt prefer that way. We strictly restricted the list to two domains: Clinical Finding and Situsation with explicit context.
However even this is the concept entred, using the SNOMED CT model, we shoud still be able to know easily it is a suspected “condition” as one of the attribute for this concept clearly states “Medical examination for suspected condition (procedure)”, one of the two main “suspected” attributes I provided in this discussion trail.

robyn.rubin · July 8, 2020, 5:07am

Thank you for your clarification

MPhilofsky · July 9, 2020, 2:06pm

I agree the Condition Occurrence table should only include an actual diagnosis and not “suspected” diagnoses for the reasons others have stated:

And I agree

We don’t need to to make copies of every diagnosis as suspected. As @Chris_Knoll suggested, let’s find/create a concept for “suspected diagnosis”, give it a domain_id = Observation and put the suspected Condition in the Observation.value_as_concept_id field. We already do this to represent Medical History, Surgical History, Family History, etc. Let’s keep our representation of the data consistent.

LeileifromUK · July 9, 2020, 2:18pm

I have explained using examples why disease suspected hierarchy doesn’t cover all suspected diagnoses and what algorism to use to procure a list of suspected conditions in SNOMED CT . some concepts may not be appropriate to map to the concept under that hierarchy. just bear that in mind.

MPhilofsky · July 9, 2020, 2:47pm

This solution:

Will cover this use case:

The direct representation of the suspected disease will remain intact in the Observation.value_as_concept_id (when the Condition maps to a concept_id) or Observation.value_as_string (when the Condition is string format) field. The Observation.concept_id = ‘suspected diagnosis’ or something.

Example:
Observation.concept_id = ‘suspected diagnosis’
Observation.value_as_concept_id = 312327
Observation.value_as_string = Acute Myocardial Infarction

Chris_Knoll · July 9, 2020, 3:37pm

I wouldn’t use the string in that way when the concept itself is sufficient. Otherwise, I do like this approach. Tools will have to be modified to support the notion of a ‘concept set’ for specifying the value_as_concept_id values, but that shouldn’t be too much of a problem.

Christian_Reich · July 9, 2020, 7:00pm

I am with Chris, @MPhilofsky. In the long run, we want to get rid of the strings altogether (except in NOTE). That will make OMOP ultimately anonymized and fully computable.

Alexdavv · July 10, 2020, 7:56am

Here you go: 4219847 Disease suspected

I think this just to preserve the part of the source_value that imply the diagnosis. The observation table doesn’t contain value_source_value filed, while it does exist in the Measurement table. So a kind of workaround to keep the logic consistent for both tables when source_value consists of two elements.

Christian_Reich · July 10, 2020, 11:47am

@Alexdavv: Good try, but no workarounds or usage of fields for a different purpose because “you want to store something” and that field happens to be available. It’s a Common Data Model. Things have to strictly follow the rules, otherwise you cannot do remote queries.

If you think there is a value_source_value field necessary - bring it on as a proposal, please.

Alexdavv · July 10, 2020, 1:13pm

This is how I gently tried to propose this:

What do you and others think?

MPhilofsky · July 10, 2020, 4:07pm

@Christian_Reich,

The conventions for Observation.value_as_string are “The observation result stored as a string. This is applicable to observations where the result is expressed as verbatim text.”. In my example, “Myocardial infarction” is the verbatim text from the source. And the Observation.value_as_concept_id = 312327 is the standard concept for the verbatim text. If the data came across as a source_code which mapped to a standard concept_id, then I wouldn’t insert the code in the value_as_string field. However, my EHR data stores it as free text.

And I completely agree with

I also agree with this

Those of us working with EHR data try to map to all source values to standard concept_ids. But the reality of the situation is there is a very long tail of singletons about a mile long and it would take a very long time to map every string to a concept_id. It is a waste of time and resources to map every string. However, keeping the string data in the CDM allows data holders behind the firewall to view the unmapped source values to assess their worth. The data holders can view the the unmapped string results to see if the unmapped values are mappable, update their mappings, rerun the ETL and participate more fully in community research. This information is also available for many (all?) other concept_id fields

The above is the use case for @Alexdavv’s proposal to add an Observation.value_source_value field to the CDM

MPhilofsky · July 10, 2020, 4:07pm

Thanks, @Alexdavv!

Alexdavv · July 10, 2020, 5:18pm

I assume @Christian_Reich and @Chris_Knoll advocate that the value_as_string field is for storing the verbatim result, but only when it’s already a processed result. When it was found, that this result:

is not numeric (that should be placed to value_as_number field only);
and cannot be mapped to any concept (that should be placed to the value_as_concept_id only).

Namely, instrument raw data, DNA sequences, proper names (not patient’s, for sure), etc.

And the main reason is that standardized analytics (basically, string match) can be applied to the value_as_string field. That is why we need to keep this field fairly clean.
Normally nobody applies standardized analytics to the _source_value fields, where this particular ‘Myocardial infarction’ from the source to be placed.

Christian_Reich · July 10, 2020, 6:44pm

I think we are all on the same page, here. Let’s add the source_value proposal, and let’s put a proposal in to get rid of the string thing, which will not be executed until the time that we have much better capacity to map things.

Andy_Kanter · July 13, 2020, 1:32am

Christian, it was a little hard to follow all of the discussion here, and the switching between diagnosis, procedure and attributes of diagnoses was confusing. I think the question of saving the original text from the source, in addition to the coded value is best practice. Not only for helping to ensure fidelity is not lost when people go back to extract more information or check the standard concept maps, but also from a provenance perspective. I think CDA and FHIR are both ensuring original text is not lost.