In finalizing the CMS-ETL, we have run into some inconsistencies in managing unmapped concepts with which we seek clarity and guidance.
-
Race: the concepts for Unknown Race, Non-white, and Other Race (8552, 9178, and 8522) have been deprecated. If we put 0 for the concept_id in a Person record, Achilles Heel gives “WARNING: 4-Number of persons by race; data with unmapped concepts”. Since race_concept_id is a required field in the Person record, we cannot put NULL. Our use case, is that the CMS data has White, Black, Others, and Hispanic as race. We want to set race to 0 for Others and Hispanic (or better yet have undeprecated concepts to use), and use the two ethnicity concepts (38003563, 38003564) for Hispanic or Non-Hispanic as appropriate. Stepping back, I’m wondering if race_concept_id should be a non-required field, and then one could use NULL – many times race information is optional for patients to enter, and thus not available in source data.
-
Condition: when a condition is not mapped within the vocabulary, the specifications say, “When the source code cannot be translated into a Standard Concept, a CONDITION_OCCURRENCE entry is stored with only the corresponding source_concept_id (if available) and source_value and a condition_concept_id of 0.” When set condition_concept_id of unmapped conditions to 0, we get the Achilles Heel warning: “WARNING: 400-Number of persons with at least one condition occurrence, by condition_concept_id; data with unmapped concepts”. Again, since condition_concept_id is required, we cannot set it to NULL.
-
Procedure and Observation: does not document what to do when a procedure or observation is unmapped. If we use the same convention as for condition of setting procedure_concept_id or observation_concept_id to 0, we get the Achilles Heel warnings: “WARNING: 800-Number of persons with at least one observation occurrence, by observation_concept_id; data with unmapped concepts”, and “WARNING: 600-Number of persons with at least one procedure occurrence, by procedure_concept_id; data with unmapped concepts”. Should we use 0? The documentation is unclear.
-
Drugs: the same convention for umapped drug records as for conditions is documented in the CDMv5 documentation, “When the Drug Source Value of the code cannot be translated into standard Drug Concept IDs, a Drug exposure entry is stored with only the corresponding source_concept_id and drug_source_value and a drug_concept_id of 0.” However when we do this, Achilles heel gives, “WARNING: 700-Number of persons with at least one drug exposure, by drug_concept_id; data with unmapped concepts”.
-
Observation values: Finally, as posted here, Achilles heel rule 814 enforces that at least one of the OBSERVATION Field value_as_string, value_as_number, or value_as_concept_id be non-NULL. However, none of these three fields are required according to the CDM v5 vocabulary specification. We can make this error go away, for example, by putting 0 for value_as_concept_id, however, shouldn’t we put NULL if there is no information about values for observations? Why is 0 allowed for value_as_concept_id here (at least no warning is printed), but using 0 for various concept_ids for 1-4 above creates warnings. On the other hand, consistent with this Achilles Heel error, the Observation documentation says, “There should be no observations records without an associated value”. What value should we place in which of these three value fields when there is no obvious value?
Thanks!