The conventions for Observation.value_as_string are “The observation result stored as a string. This is applicable to observations where the result is expressed as verbatim text.”. In my example, “Myocardial infarction” is the verbatim text from the source. And the Observation.value_as_concept_id = 312327 is the standard concept for the verbatim text. If the data came across as a source_code which mapped to a standard concept_id, then I wouldn’t insert the code in the value_as_string field. However, my EHR data stores it as free text.
And I completely agree with
I also agree with this
Those of us working with EHR data try to map to all source values to standard concept_ids. But the reality of the situation is there is a very long tail of singletons about a mile long and it would take a very long time to map every string to a concept_id. It is a waste of time and resources to map every string. However, keeping the string data in the CDM allows data holders behind the firewall to view the unmapped source values to assess their worth. The data holders can view the the unmapped string results to see if the unmapped values are mappable, update their mappings, rerun the ETL and participate more fully in community research. This information is also available for many (all?) other concept_id fields
The above is the use case for @Alexdavv’s proposal to add an Observation.value_source_value field to the CDM