To map lab results we have an amazing source in LOINC and it is the “answer of” relationship. However, many strings that are actual results of a proper measurement_concept_id are not part of the set, or even labs that have no concepts with that relationship.
If we map the string using concept_name there are cases where there is more than one concept_id with the same concept_name string. The query below gives some examples
SELECT
concept_name,
STRING_AGG(DISTINCT(vocabulary_id), " ,") as vocabulary,
COUNT(DISTINCT concept_id) AS counts FROM
concept WHERE
standard_concept =“S”
AND domain_id = “Meas Value” GROUP BY
concept_name ORDER BY
counts DESC
cases where there is more than one concept_id with the same concept_name string
In AllofUs PPI vocabulary this is happening a lot. I mentioned it as a bug (long time ago) but I was told it is a feature (by design; their intention).
For NAACCR data, the reason that there are multiple concept_id for the same concept_name is because each concept_id corresponds to a particular combination of cancer anatomic site, NAACCR item number and NAACCR item value, which is represented in the concept_code column. Using ‘No regional lymph node involvement’ as an example, I am listing some of them below:
Awesome, thank you!. The question remaining is what code to choose. One way is to possibly create the answer of relationship in the concept_relationship table for the particular lab to be able to choose appropriately.
CDM is not restricted to have value_as_concept_id field populated with concept, that has relationship with concept in measurement_concept_id field. However, it’s a good manner to event and value pair belongs to the same vocabulary and even have relationships with each other.
So the first option is to map measurements and values separately and don’t care about ‘answer of’ relationship between them.
The second one: If you have a small number of measurements, you can create some confusing logic of conversion and map these measurement-value pairs individually to pick up value for each measurement.
Hi @zhuk, Thank you for the answer. Indeed option 1 makes total sense. However as I pointed out we have multiple concepts within the same vocabulary with the same concept_name, and the same domain_id. We could choose whatever we want if we have multiple options, but it may be better to have a guideline to disambiguate or choose.