OHDSI Home | Forums | Wiki | Github

CPT Hierarchy errors - lost children in 2023 and changed domains

There appears to have been a major change in the CPT hierarchy starting with the January 2023 release.

As an example, this CPT classification code subsumes the 4 Office or Outpatient visits (99202 - 99205). As expected, in the v5.0 22-JUN-22 and v5.0 31-OCT-22 Athena vocabulary releases, the ancestor_concept_id (45889484) in concept_ancestor has the expected 4 values for descendant_concept_id.

However, in the v5.0 23-JAN-23 release, that concept has no descendant concepts. Moreover, the domain_id has changed. In the 2022 releases, the concept is in the Procedure domain, but in the 2023 release, it is an Observation.

Here are greps of the raw downloaded files from the v5.0 22-JUN-22 release:

$ grep None VOCABULARY.csv
None    OMOP Standardized Vocabularies  OMOP generated  v5.0 22-JUN-22  44819096
$ grep 2414392 CONCEPT_CPT4.csv
2414392         Procedure       CPT4    CPT4    S       99203   19700101        20991231
$ grep 45889484 CONCEPT_ANCESTOR.csv
45889484        2414391 1       1
45889484        2414394 1       1
45889484        2414393 1       1
45889484        2414392 1       1
45888946        45889484        1       1
45889197        45889484        3       3
45888982        45889484        2       2
45889484        45889484        0       0

And here is the same information from the v5.0 23-JAN-23 release:

$ grep None VOCABULARY.csv
None    OMOP Standardized Vocabularies  OMOP generated  v5.0 23-JAN-23  44819096
$ grep 45888946 CONCEPT_CPT4.csv
45888946                Observation     CPT4    CPT4 Hierarchy  C       1013626 20141010        20991231
$ grep 45889484 CONCEPT_RELATIONSHIP.csv
45889484        45888946        Is a    19700101        20991231
45889484        2414391 Subsumes        19700101        20991231
45889484        2414394 Subsumes        19700101        20991231
45889484        2414392 Subsumes        19700101        20991231
45889484        2414393 Subsumes        19700101        20991231
45888946        45889484        Subsumes        19700101        20991231
2414391 45889484        Is a    19700101        20991231
2414394 45889484        Is a    19700101        20991231
2414392 45889484        Is a    19700101        20991231
2414393 45889484        Is a    19700101        20991231

These changes are breaking some of my concept sets (e.g. ones looking for new patient doctors’ visits). However, there appear to be hundreds of cases now where Concept Hierarchy codes that used to have descendants no longer do. There are also many where the domain has changed.

Is this an error or intentional? If intentional, where can I find the documentation about those changes, and how will future proposed changes like these be communicated so that users can remediate their concept sets and cohorts accordingly?


Commenting on the future changes in the vocabularies impacting the concept sets: it is an important problem. There are conversations happening here and there on how to best handle it and inform the community given that the Vocabularies are not
(nor are expected to be) stable. Would be great to hear your thoughts.

Options are but not limited to: documentation, centralized checks (think Frank’s tool An Evaluation of the Impact of Vocabulary Evolution on Established Phenotypes – OHDSI) or database-specific checks on the user side (think of the set of cohorts and check somebody runs when updating the data locally).

@aostropolets , an additional challenge I’m just noticing is that not only did the parent concept change domain from Procedure to Observation, but the children E&M codes changed domain from Procedure to Visit. That breaks our ETL pipeline.

In our ETL, data are put into appropriate tables based upon domain_id, so all 9920x E&M codes are getting deleted (in 2022, they correctly landed into the Procedures table).

What is the current guidance – should 186 CPT4 codes that are newly in the Visit domain create records in procedure_occurrence, observation, or somewhere else?

Also, great to hear that @Frank created a tool, but I don’t see a link to GitHub for the code and instructions for running the tool. Where can I find that info?

Hello, @Thomas_White

Sorry to hear, that vocabulary refresh broke your ETL pipeline. These changes were intentional. Mapping to visits has been derived from extensive CPT4 descriptions. Concepts with mappings were destandardized and therefore you could no longer find these concepts in the concept_ancestor table because there are only Classificational and Standard concepts by design. However, the hierarchical relationships in the concept_relationship table are still present.

The vocabulary team always provides release notes with the releases. What else could we do to provide smoother refreshes in the future? Would be great to hear your thoughts.

The QR code from the abstract leads to https://gist.github.com/fdefalco/2aca8656804cd1b3618f4a64c5900c88.

@zhuk - thanks for the link to the release notes. Somehow I had never seen them before.

What would really help is an easily searchable lineage that shows the pre and post values. I can see from the Release notes v20230116_major that 1164 CPT codes were deStandardized and mapped over to the Standard concepts in the respective domains. However, I don’t see where I I can find a listing of exactly which CPT codes had attributes changed, and what they changed to.

Is there a database that tracks such changes? I presume the concept attributes might change multiple times over many years. If so, a data model that lets you search for all changes to concepts, relationships, and ancestors over time (flagging vocabulary release date) would enable easier searching and understanding of what changed - plus the ability to do robust impact analysis across versions (both for automating ETL updates and also for automating changes to concept sets or cohorts to account for the those changes).

Does such a database already exist?
Are there web-based tools to navigate the history of changes to concept attributes or relationships?
Are there plans to augment Athena to let people navigate the history of changes of concepts?

Lastly, now the the E&M CPT codes are in the Visit domain, where are records about those CPT codes supposed to land in the OMOP data model? There is no place for them in the visit_occurrence table (since those codes are not valid values for visit_concept_id). Should the CPT E&M codes continue to generate records in the Procedure table? The Observation table? Other?

That sort of information would also be helpful in the release notes - especially when the domain for certain codes change and there might be confusion about what target table they should land in.

This theme has been debated for years now and there certainly are signs of improvement in the situation (‘What’s new’ section in release notes, numbers of changed concepts, extended descriptions, etc.). The changes in vocabularies like CPT4 and HCPCS are not usually big, because the vocabularies themselves are small, but for Snomed, and drug vocabularies, such as RxNorm and RxNorm Extension, changes are enormously huge, often more than 10K concepts. Therefore it is not easy to store them in the GitHub Release Notes section.

Within the Vocabulary team, we use audit package to track changes within the database. It writes all the changes for concepts in a log table. Unfortunately, it only writes changes done by scripts (INSERT, UPDATE) and does not perfectly suitable for your use case, when the vocabularies are downloaded and updated manually. However, you can try to adjust it.

For your use case, usage of scripts, that show differences between vocabulary versions may be used. For example, before downloading new vocabularies, you create backup tables with the previous version in a separate schema and then compare 2 versions table by table with help of custom-made scripts or our scripts

Regarding CPT4 codes and Visit domain. I am not sure that I understand the problem. Why not Visit table? If you have visits constructed from some other codes, you could do deduplication during ETL.

I think Tom is talking about a definitive mapping of CPT4 and HCPCS codes to Visit concepts, if they mention them. Once that is available, the ETL indeed could do the deduplication (can’t have the same visit twice a day, can you?)

@Christian_Reich , I think I stated my question poorly, so let me restate.

Now (since January 2023 release) that certain CPT4 and HCPCS codes are officially part of the Visit domain (instead of Procedure domain), where should ETL land those data? For example, for a 99202 CPT4 code (new patient office visit), we used to create a record in the procedure_occurrence table when that CPT4 code was part of the procedure domain). That way we could build cohorts and do analyses about specific CPT codes as needed (such as when they are part of a quality measure definition).

If the recommendation is to no longer store those CPT codes in the procedure table, I’m not sure where else they could be stored and also be accessible via Atlas. They are not valid codes for visit_concept_id. And, if they were added as visit_detail, Atlas doesn’t enable direct search of visit_detail.

So, I’d advocate for continuing to have those CPT codes generate records in the procedure_occurrence table. However, that could lead to additional confusion for both ETL-ers and end-users as long as those CPT codes live in the Visit domain.

So, the bottom line is that I want to ensure I can use Atlas to define cohorts to query for specific CPT codes. This was possible when they were procedures and we had access to the standard CPT hierarchy. Now that selected CPT codes have been moved to the Visit domain, it is not clear where those CPT codes should land in CDM tables so that we retain provenance plus the ability to query them via Atlas.

I hope that makes more sense.

Lots of US EHR data holders:

And regarding deduplication, unfortunately, it’s not so easy and next to impossible for some EHR data sources. The source visits/encounters are not always linked to the CPT4 “billing code”.

And now these CPT4 codes are no longer standard :frowning: So, they can’t be used for network queries :frowning: And many map to generic “office visit”, which doesn’t give the level of detail necessary to meet @Thomas_White use case & other’s use cases as we discussed this am on the HSIG call.

Looking at these CPT4 ‘visit’ codes. They do identify a visit, however, it is the attributes located in the description of the code which are most useful to the use cases described. And I would argue these attributes, “new patient office visit”, “treatment variability”, etc. are “Observations” and also belong in the Observation table.

Hm. Interesting debate. Just to make it clear upfront: It happens not because our model is flawed, but the frigging CPT4 and HCPCS codes are a mess of anything that could be used to justify payment. And lots of folks have become addicted to using them and interpreting them very narrowly. Of course, none of that makes sense from outside the US. Even having them as standard concepts.

Which is obviously a violation of the CDM, as “new patient office visit” is not a procedure, but - an office visit. But I would like to understand better your use case problems:

If they are mapped to Visit, do you care whether they whether the information was originated as CPT4 or from other information in the source? Why?

That needs to be fixed. However, VISIT_DETAIL is mostly relevant to inpatient visits, is it not? An office or ER visit, which is what most of these concepts represent, should be in VISIT_OCCURRENCE, no?

Why not? If you have an EHR that indicates an outpatient office visit how is that different from using the CPT4 code? And why can’t you go through day by day in the life of the patient and make sure there is only one per day? With some exception for certain specialties?

This is the way the EHR is designed. We have “billing” data (CPT4 codes) and “encounter” data (giant, unwieldy, ambiguous tables to run the business of seeing patients). Two separate things that aren’t linked. It’s messy stuff. And we do our best to de-duplicate or merge all that we can, but sometimes it’s not possible. Visits are especially hard because the EHR contains many visits which are not patient-provider interactions and there is not a reliable flag. Every time someone at the healthcare system enters something into a person’s electronic chart, it must be linked to an encounter in the person’s chart. And if there isn’t an appropriate encounter to link, then an encounter is created. A person fills out a document, encounter record created. MRI is faxed over from another institution, another encounter. Labs reviewed by the RN, another encounter. Different domains within the EHR differ in granularity and ability to establish a link between billing and encounter data. Messy stuff we can discuss over a beer, or two, possibly 3 beers because it’s a long conversation :slight_smile:

Messy EHR data aside, I’m still arguing for inclusion on these CPT4 codes in the Observation table. Since these are observations:

1 Like

I take the beer offer. :slight_smile:

If there is more than a visit information - why not. I just want to prevent people using the OBSERVATION table instead of the VISIT_OCCURRENCE table to look for visits.

1 Like

In case you missed the discussion on this topic in the CDM WG this morning, it is located here.

if you add them to the observation domain do you use the visit concept id in the observation_concept_id field?