OHDSI Home | Forums | Wiki | Github

Dealing with deprecated & replaced concept values during vocabulary updates

Are there standard mechanisms to detect and remediate situations where standard concepts are deprecated and replaced across vocabulary versions?

As a specific example, we’re doing analyses on GFR, which has units of “milliliter per minute per 1.73 square meter”. The UCUM code for that concept has changed over time:

  • 1970-01-01 to 2014-01-03: mL/min/{1.73}m – concept ID 9062
  • 1970-01-01 to 2022-03-28: mL/min/1.73.m2 – concept ID 9117
  • 2022-04-07 to 2099-12-31: mL/min/{173.10*-2.m2) – concept ID 720870

Our ETL worked fine until last month when we stopped getting units for GFR without realizing it - because our ETL was expecting the mL/min/1.73.m2 mapping to continue to work even though that UCUM value has been deprecated.

Has anyone built transitive logic for vocabularies for this situation? In general, if we have a concept that maps to an invalid concept for which there is a single replacement, shouldn’t it be possible to automatically have the ETL vocabulary linking tables automatically pick the updated standard concept?

I expect such a solution would be helpful to the community to minimize surprises across ETL runs.

Just checking that you are aware of the ‘Concept replaced by’ relationship id.
SELECT concept_id_1 AS updated_concept, concept_id_2 AS current_standard_concept
FROM concept_relationship
WHERE relationship_id = ‘Concept replaced by’
AND concept_id_1 IN( 9062, 9117, 720870)

It looks like UCUM is updating the concept code, but not the concept name. I’m unsure why a Vocabulary would update the code and not the name. If anyone knows, I’d love insight as to why this would be necessary. To me, the meaning stays the same.

Are these units custom mapped using the STCM table or the Concept/Concept Relationship tables? Or are you using a string match from the source value to the concept code?

If you are custom mapping, you are essentially hard coding the standard concept_id, even if the concept_id has been deprecated/updated. One thought, I haven’t used this before, would be to add SQL logic to the ‘mapped to’ concept_id stating IF the ‘mapped to’ concept_id is non-standard and has a relationship_id = ‘replaced by’, then look it up in the CR table WHERE concept_id = concept_id_1 and relationship_id = ‘concept replaced by’, THEN use concept_id_2.

Something we have just implemented at Colorado University is the DQD. We are customizing our threshold failures. By bumping up the threshold failure % by 1% above the current failure rate, we will be notified if there is a somewhat significant change to either our source data, our semantic mappings or our ETL process. We run the DQD after our twice weekly ETL run. We’re still fine tuning our process, but were just notified of a Measurement Unit failure due to an increase in unit_concept_id = 0. In our case, we have new source units not mapping. I highly recommend running DQD with each ETL run and customizing your threshold values!

Edited: incorrect relationship_id

Thank you both.

@MPhilofsky, we have also incorporated DWD into our ETL pipeline. We use DataBricks to orchestrate everything, and can now get the full DQD suite to complete in about an hour on our Spark cluster. We use pre-generated SQL (rather than RStudio) to run DQD, and have an open pull request in case others want to try the same approach.

I like the strategy you mentioned of comparing across DQD runs using the prior run’s failure rate as a benchmark. How did you accomplish that? Did you have to modify the DQD R package (and associated csv files) each time, or do you have a more streamlined appoach?

We modified the csv file. And once modified, the changed threshold persists in subsequent runs. I know @Frank is working on a feature which will highlight some changes from one DQD run to the next.

Meaning totally stays the same. We updated this concept due to Vocabulary issue, reported to us. Yes, we have to admit that the first two concept codes were incorrect. The best way to change the concept is to create a new one and then build a relationship between them, changing codes is not a best practice.

@Thomas_White great catch. We can recommend checking all the concepts used in your ETL when you update vocabulary version just to be sure that nothing becomes deprecated.

1 Like

Thank you for the clarification, @zhuk!