OHDSI Home | Forums | Wiki | Github

Inconsistences between vocabulary documentation and Achilles Heel results: Race, Condition, etc

In finalizing the CMS-ETL, we have run into some inconsistencies in managing unmapped concepts with which we seek clarity and guidance.

  1. Race: the concepts for Unknown Race, Non-white, and Other Race (8552, 9178, and 8522) have been deprecated. If we put 0 for the concept_id in a Person record, Achilles Heel gives “WARNING: 4-Number of persons by race; data with unmapped concepts”. Since race_concept_id is a required field in the Person record, we cannot put NULL. Our use case, is that the CMS data has White, Black, Others, and Hispanic as race. We want to set race to 0 for Others and Hispanic (or better yet have undeprecated concepts to use), and use the two ethnicity concepts (38003563, 38003564) for Hispanic or Non-Hispanic as appropriate. Stepping back, I’m wondering if race_concept_id should be a non-required field, and then one could use NULL – many times race information is optional for patients to enter, and thus not available in source data.

  2. Condition: when a condition is not mapped within the vocabulary, the specifications say, “When the source code cannot be translated into a Standard Concept, a CONDITION_OCCURRENCE entry is stored with only the corresponding source_concept_id (if available) and source_value and a condition_concept_id of 0.” When set condition_concept_id of unmapped conditions to 0, we get the Achilles Heel warning: “WARNING: 400-Number of persons with at least one condition occurrence, by condition_concept_id; data with unmapped concepts”. Again, since condition_concept_id is required, we cannot set it to NULL.

  3. Procedure and Observation: does not document what to do when a procedure or observation is unmapped. If we use the same convention as for condition of setting procedure_concept_id or observation_concept_id to 0, we get the Achilles Heel warnings: “WARNING: 800-Number of persons with at least one observation occurrence, by observation_concept_id; data with unmapped concepts”, and “WARNING: 600-Number of persons with at least one procedure occurrence, by procedure_concept_id; data with unmapped concepts”. Should we use 0? The documentation is unclear.

  4. Drugs: the same convention for umapped drug records as for conditions is documented in the CDMv5 documentation, “When the Drug Source Value of the code cannot be translated into standard Drug Concept IDs, a Drug exposure entry is stored with only the corresponding source_concept_id and drug_source_value and a drug_concept_id of 0.” However when we do this, Achilles heel gives, “WARNING: 700-Number of persons with at least one drug exposure, by drug_concept_id; data with unmapped concepts”.

  5. Observation values: Finally, as posted here, Achilles heel rule 814 enforces that at least one of the OBSERVATION Field value_as_string, value_as_number, or value_as_concept_id be non-NULL. However, none of these three fields are required according to the CDM v5 vocabulary specification. We can make this error go away, for example, by putting 0 for value_as_concept_id, however, shouldn’t we put NULL if there is no information about values for observations? Why is 0 allowed for value_as_concept_id here (at least no warning is printed), but using 0 for various concept_ids for 1-4 above creates warnings. On the other hand, consistent with this Achilles Heel error, the Observation documentation says, “There should be no observations records without an associated value”. What value should we place in which of these three value fields when there is no obvious value?

Thanks!

1 Like

I’ll try to answer the best I can:

Most of the questions surround the use of conceptID 0 and the warnings you get from Achilles heel. These are just warnings, and I think the ideology around it is that we want to have 100% of the data have mapped concepts. Concept ID 0 is considered an ‘unmapped value’. So for 1-4 of your questions, if you have any conceptID = 0 in your data, you get a warning. Admittedly, there’s some data sources where you just don’t have the information and therefore will never get rid of those warnings. So perhaps we need a parameter to Heel() that lets you ignore certain warnings?

For 5, I believe there’s some observations that have none of the value as string, concept_id, number. So it could be an error that 814 shouldn’t look for any of those things, or maybe only look for concept_id of 0 in the value as concept field and report the warning (as the philosophy behind 1-4 above).

One limitation in the vocabulary is that you can’t ask it for a given observation concept ID, what sort of ‘value as’ should we expect, so we can’t do validation on the data based on what we expect. (unless there’s something in the vocab to do this that I’m not aware of).

So seems like if you know certain warnigns are to be expected, then a ‘squelch warning’ parameter where you can give a list of warning IDs to ignore so that it doesn’t appear in the output would be something you’d want to have?

-Chris

Thanks, @Chris_Knoll

Is it then correct to use conceptID 0 for unmapped procedures and observations, even though it is not clearly documented (unlike for drug_exposure and condition_occurrence which does specify to use 0)?

My concern is that there is a big difference between using an unmapped conceptID other than 0, where that should probably be an error (not a warning), versus the documented use of 0 for unmapped terms, where one could argue whether it should even be a warning.

For the fifth issue, I’d just like to know from the vocabulary experts what to do. Our observations are coming from CONCEPT_RELATIONSHIP mappings from ICD9CM, HCPCS or CPT4 codes, where there is no additional data to specify a value. If you are going to map from an ICD9CM, HCPCS or CPT4 code to an Observation and require that there be a value for an observation, then the vocabulary should say something about what the observation should be. Certainly having an ERROR seems harsh, when it is unclear what to specify. I wonder if @Christian_Reich could weigh in on this.

@Christophe_Lambert:

Here is the rule: All mandatory _concept_id fields in the tables have to be set to 0, instead of null. 0 means “information not available”, whether it is because it is an unknown gender or there is no available mapping. The _concept_id fields that are optional, like the value_as_concept_id, can be set to null. Putting in a 0 won’t hurt either.

He can. :smile:

So, in V5, in addition to the “Maps to” relationships there are “Maps to value” relationships. They should give you a value if there is one. We are still working on perfecting this.

However, we are not enforcing values. The fact that a certain measurement (lab test) was run is already information, even though there are no results known. There are cohort definition banking on the fact that you ran a Creatine Kinase or Troponin test as support for the diagnosis of Myocardial Infarction.

1 Like

Thanks, @Christian_Reich, that clarifies things a lot!

Right now, it looks like there are 910 valid “Maps to value” relationships, but over 142k Observation concepts. Is the plan to have a “Maps to value” for every Observation to satisfy the documentation, “there should be no observations records without an associated value”? I guess I’ll put in value_as_concept_id=0 for now.

Heel error type is very important.
Latest version of Achilles introduced a new type of output even.

The rule output types are: (ordered from least problematic)

  • NOTIFICATION: something you may want to know about your data. Output that is milder than warning.
  • WARNING: mild problem (not an error)
  • ERROR: true error in data; implausible phenomenon

Heel has two types of rules (by purpose):

  • conformance to the model
  • looking at data and finding DQ errors

We should not obsess about having Heel results with no output. The guidance is to generally focus on errors only.

Also, within OHDSI - there is no agreement about what to do with erroneous patients (or rows). Some sites remove problematic rows (to create an illusion of high quality data), others keep them there and live with the eye provoking Heel output.


Example of notification is rule_id 28:

@Christian_Reich - is it possible in the documentation to highlight this? This “Maps to Value” is news to me. The only time I was putting a CONCEPT_ID in there is if a value of some type came over from the raw data (i.e. positive/negative) and I was able to map it to a CONCEPT_ID.

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:observation

Can we change the VALUE_AS_CONCEPT_ID description to this:

A foreign key to an observation result stored as a Concept ID. This is applicable to observations where the result can be expressed as a Standard Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.). If this information does not come from the raw data record you can also use the ‘Maps to Value’ relationship in the Vocabulary off the incoming source code.

Additionally, would we follow this logic through to the MEASUREMENT table?

@erica:

I’ll add it to the conventions below the table description.

Don’t understand the question. Aren’t we talking about the MEASUREMENT table?

The above comment was for the OBSERVATION table.

In the MEASUREMENT table most of the time the incoming lab data has information like positive/negative and we would map those values to VALUE_AS_CONCEPT_IDs. However if I have an incoming ICD9 into the MEASUREMENT table due to a Vocab move it won’t have an incoming value like lab data so I was thinking to use this “Map to value” relationship for those items.

Oh. It’s true for both Observation and Measurement. Depending on where th Maps to relationship goes to.

Okay - then let’s add a note on both of them. I’d prefer the note up in the table but if you think the conventions is more appropriate than go ahead.

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:observation
http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:measurement

@ericaVoss:

Done. Check it out.

@Christian_Reich & @schuemie -

FYI, @Ajit_Londhe and I found the following codes land in 2 or more ‘Maps to value’ CONCEPT_IDs:
'V06.0','V06.1','V06.2','V06.3','V06.4','V06.5','V06.6','V12.2','V54.24','V88.11', 'V88.12','Z27.1','Z27.2','Z27.3','Z27.4'

This would cause the OBSERVATION or MEASUREMENT row to duplicate for each ‘Maps to Value’ that exists. Unless you have other ideas, I’m going to just take the top record alphabetically for this VALUE_AS_CONCEPT_ID.

WITH CTE_VOCAB_MAP AS (
       SELECT c.concept_code AS SOURCE_CODE, c.concept_id AS SOURCE_CONCEPT_ID, c.concept_name AS SOURCE_CODE_DESCRIPTION, c.vocabulary_id AS SOURCE_VOCABULARY_ID, 
                           c.domain_id AS SOURCE_DOMAIN_ID, c.CONCEPT_CLASS_ID AS SOURCE_CONCEPT_CLASS_ID, 
                                                   c.VALID_START_DATE AS SOURCE_VALID_START_DATE, c.VALID_END_DATE AS SOURCE_VALID_END_DATE, c.INVALID_REASON AS SOURCE_INVALID_REASON, 
                           c1.concept_id AS TARGET_CONCEPT_ID, c1.concept_name AS TARGET_CONCEPT_NAME, c1.VOCABULARY_ID AS TARGET_VOCABUALRY_ID, c1.domain_id AS TARGET_DOMAIN_ID, c1.concept_class_id AS TARGET_CONCEPT_CLASS_ID, 
                           c1.INVALID_REASON AS TARGET_INVALID_REASON, c1.standard_concept AS TARGET_STANDARD_CONCEPT
       FROM CONCEPT C
             JOIN CONCEPT_RELATIONSHIP CR
                        ON C.CONCEPT_ID = CR.CONCEPT_ID_1
                        AND CR.invalid_reason IS NULL
                        AND cr.relationship_id = 'Maps To Value'
              JOIN CONCEPT C1
                        ON CR.CONCEPT_ID_2 = C1.CONCEPT_ID
                        AND C1.INVALID_REASON IS NULL
)
SELECT *
FROM CTE_VOCAB_MAP
/*EXAMPLE FILTERS*/
WHERE SOURCE_CODE IN (
'V06.0','V06.1','V06.2','V06.3','V06.4','V06.5','V06.6','V12.2','V54.24','V88.11',
'V88.12','Z27.1','Z27.2','Z27.3','Z27.4'
)
ORDER BY 1

No. Duplicate (and triplicate). Take ‘V06.1’ 44827298 ‘Need for prophylactic vaccination and inoculation against diphtheria-tetanus-pertussis, combined [DTP] [DTaP]’. There is the need to vaccinate against diphtheria, tetanus and pertussis. Why don’t you like it?

t