I had previously been under the impression that if a concept were to be “invalid”, it’d have the “invalid_reason” populated and the “valid_end_date” set to the date of deprecation; and similarly that if a “valid_end_date” is NOT the default of 2099-12-31 then we would expect for the “invalid_reason” to be populated.
Today I was surprised to discover that this query (on vocabulary v5.0 27-FEB-25):
select
vocabulary_id,
standard_concept,
count(*)
from
vocab.concept
where
invalid_reason is null
and valid_end_date <= now()
group by
vocabulary_id,
standard_concept
;
returns these results:
vocabulary_id
standard_concept
count
CMS Place of Service
S
1
CMS Place of Service
3
CPT4
C
324
CPT4
S
1719
CPT4
431
HCPCS
S
3028
HCPCS
969
ICD10PCS
S
4745
ICD10PCS
29
ICD9Proc
S
5
ICD9Proc
1
At first I was worried that something had gone awry in our version of the vocabularies, but I ultimately discovered a forums post describing why this happened for ICD10PCS: ICD10PCS: bringing back deprecated codes. In the post, there’s a comment that also alludes to this being the case for CPT4 and HCPCS.
My colleague wisely thought to consult the Book of OHDSI and found that Chapter 5 has this documented:
Reused code for another new concept
Description: The vocabulary reused the concept code of this deprecated concept for a new concept.
VALID_START_DATE: Day of instantiation of concept, if that is not known day of incorporation of concept in Vocabularies, if that is not known 1970-1-1.
VALID_END_DATE: Day in the past indicating deprecation, or if that is not known day of vocabulary refresh where concept in vocabulary went missing or set to inactive.
INVALID_REASON: “R”
However, I can’t find any other documentation referencing the use of “R” – and I’m only seeing values of D, U and NULL in the field (at least, for the vocabularies we’ve downloaded).
So I have a couple of questions:
Are these concepts I’ve flagged instances where the source vocabularies “replaced” / “reused” their own codes?
Should these have invalid_reason = 'R'?
Could we update documentation to make this clearer for future folks?
Hi Will!
Reuse (by the source) and resurrection (in OMOP) are different problems.
Those that you found were not reused, and therefore should not be marked with “R”.
But those that should are not at the moment because this change wasn’t implemented yet. So the Book of OHDSI is not that outdated. In this case it describes the future
Thanks for the quick response - this is helpful. Good to know about the Book of OHDSI describing the desired future state!
For the “zombies” that is helpful - I think I’m understanding the distinction. My question is the github article specifically calls out that these concepts should have standard_concept = 'S'. But I’m seeing instances where they are non-standard or classification. I can see how the same logic applies to the classification concepts, but I’m not sure about the non-standard. Is there a reason those fall into the category of having invalid_reason as NULL?
@wtroddy Good catch!
These are one-legged zombies.
It happens because the decisions to make concepts zombies are applied on the vocabulary level.
What happens next on individual vocabulary run is:
we make them zombies (Standard and valid even though the end_date is in the past)
some of these zombies got mappings to the proper Standard targets which makes them non-Standard
but we still leave them valid even though we don’t need the rule exception anymore
the generic_update function on the integration step passes them through because it believes they’re true zombies.
@m-khitrun we need to fix that at some point. Don’t you mind creating the github issue?
Thanks, Alexander! That makes perfect sense how they came to be. Looking forward to the future fix for this.
Just a thought - I know there’s some discussion about a Book of OHDSI 2.0, it might be worthwhile to think about how to incorporate some of the topics from the vocabulary github wiki or at least point to the additional resource.