OHDSI Home | Forums | Wiki | Github

CPT Hierarchy errors - lost children in 2023 and changed domains

There appears to have been a major change in the CPT hierarchy starting with the January 2023 release.

As an example, this CPT classification code subsumes the 4 Office or Outpatient visits (99202 - 99205). As expected, in the v5.0 22-JUN-22 and v5.0 31-OCT-22 Athena vocabulary releases, the ancestor_concept_id (45889484) in concept_ancestor has the expected 4 values for descendant_concept_id.

However, in the v5.0 23-JAN-23 release, that concept has no descendant concepts. Moreover, the domain_id has changed. In the 2022 releases, the concept is in the Procedure domain, but in the 2023 release, it is an Observation.

Here are greps of the raw downloaded files from the v5.0 22-JUN-22 release:

$ grep None VOCABULARY.csv
None    OMOP Standardized Vocabularies  OMOP generated  v5.0 22-JUN-22  44819096
$ grep 2414392 CONCEPT_CPT4.csv
2414392         Procedure       CPT4    CPT4    S       99203   19700101        20991231
$ grep 45889484 CONCEPT_ANCESTOR.csv
45889484        2414391 1       1
45889484        2414394 1       1
45889484        2414393 1       1
45889484        2414392 1       1
45888946        45889484        1       1
45889197        45889484        3       3
45888982        45889484        2       2
45889484        45889484        0       0

And here is the same information from the v5.0 23-JAN-23 release:

$ grep None VOCABULARY.csv
None    OMOP Standardized Vocabularies  OMOP generated  v5.0 23-JAN-23  44819096
$ grep 45888946 CONCEPT_CPT4.csv
45888946                Observation     CPT4    CPT4 Hierarchy  C       1013626 20141010        20991231
$ grep 45889484 CONCEPT_RELATIONSHIP.csv
45889484        45888946        Is a    19700101        20991231
45889484        2414391 Subsumes        19700101        20991231
45889484        2414394 Subsumes        19700101        20991231
45889484        2414392 Subsumes        19700101        20991231
45889484        2414393 Subsumes        19700101        20991231
45888946        45889484        Subsumes        19700101        20991231
2414391 45889484        Is a    19700101        20991231
2414394 45889484        Is a    19700101        20991231
2414392 45889484        Is a    19700101        20991231
2414393 45889484        Is a    19700101        20991231

These changes are breaking some of my concept sets (e.g. ones looking for new patient doctors’ visits). However, there appear to be hundreds of cases now where Concept Hierarchy codes that used to have descendants no longer do. There are also many where the domain has changed.

Is this an error or intentional? If intentional, where can I find the documentation about those changes, and how will future proposed changes like these be communicated so that users can remediate their concept sets and cohorts accordingly?


Commenting on the future changes in the vocabularies impacting the concept sets: it is an important problem. There are conversations happening here and there on how to best handle it and inform the community given that the Vocabularies are not
(nor are expected to be) stable. Would be great to hear your thoughts.

Options are but not limited to: documentation, centralized checks (think Frank’s tool An Evaluation of the Impact of Vocabulary Evolution on Established Phenotypes – OHDSI) or database-specific checks on the user side (think of the set of cohorts and check somebody runs when updating the data locally).

@aostropolets , an additional challenge I’m just noticing is that not only did the parent concept change domain from Procedure to Observation, but the children E&M codes changed domain from Procedure to Visit. That breaks our ETL pipeline.

In our ETL, data are put into appropriate tables based upon domain_id, so all 9920x E&M codes are getting deleted (in 2022, they correctly landed into the Procedures table).

What is the current guidance – should 186 CPT4 codes that are newly in the Visit domain create records in procedure_occurrence, observation, or somewhere else?

Also, great to hear that @Frank created a tool, but I don’t see a link to GitHub for the code and instructions for running the tool. Where can I find that info?

Hello, @Thomas_White

Sorry to hear, that vocabulary refresh broke your ETL pipeline. These changes were intentional. Mapping to visits has been derived from extensive CPT4 descriptions. Concepts with mappings were destandardized and therefore you could no longer find these concepts in the concept_ancestor table because there are only Classificational and Standard concepts by design. However, the hierarchical relationships in the concept_relationship table are still present.

The vocabulary team always provides release notes with the releases. What else could we do to provide smoother refreshes in the future? Would be great to hear your thoughts.

The QR code from the abstract leads to https://gist.github.com/fdefalco/2aca8656804cd1b3618f4a64c5900c88.

@zhuk - thanks for the link to the release notes. Somehow I had never seen them before.

What would really help is an easily searchable lineage that shows the pre and post values. I can see from the Release notes v20230116_major that 1164 CPT codes were deStandardized and mapped over to the Standard concepts in the respective domains. However, I don’t see where I I can find a listing of exactly which CPT codes had attributes changed, and what they changed to.

Is there a database that tracks such changes? I presume the concept attributes might change multiple times over many years. If so, a data model that lets you search for all changes to concepts, relationships, and ancestors over time (flagging vocabulary release date) would enable easier searching and understanding of what changed - plus the ability to do robust impact analysis across versions (both for automating ETL updates and also for automating changes to concept sets or cohorts to account for the those changes).

Does such a database already exist?
Are there web-based tools to navigate the history of changes to concept attributes or relationships?
Are there plans to augment Athena to let people navigate the history of changes of concepts?

Lastly, now the the E&M CPT codes are in the Visit domain, where are records about those CPT codes supposed to land in the OMOP data model? There is no place for them in the visit_occurrence table (since those codes are not valid values for visit_concept_id). Should the CPT E&M codes continue to generate records in the Procedure table? The Observation table? Other?

That sort of information would also be helpful in the release notes - especially when the domain for certain codes change and there might be confusion about what target table they should land in.

This theme has been debated for years now and there certainly are signs of improvement in the situation (‘What’s new’ section in release notes, numbers of changed concepts, extended descriptions, etc.). The changes in vocabularies like CPT4 and HCPCS are not usually big, because the vocabularies themselves are small, but for Snomed, and drug vocabularies, such as RxNorm and RxNorm Extension, changes are enormously huge, often more than 10K concepts. Therefore it is not easy to store them in the GitHub Release Notes section.

Within the Vocabulary team, we use audit package to track changes within the database. It writes all the changes for concepts in a log table. Unfortunately, it only writes changes done by scripts (INSERT, UPDATE) and does not perfectly suitable for your use case, when the vocabularies are downloaded and updated manually. However, you can try to adjust it.

For your use case, usage of scripts, that show differences between vocabulary versions may be used. For example, before downloading new vocabularies, you create backup tables with the previous version in a separate schema and then compare 2 versions table by table with help of custom-made scripts or our scripts

Regarding CPT4 codes and Visit domain. I am not sure that I understand the problem. Why not Visit table? If you have visits constructed from some other codes, you could do deduplication during ETL.

I think Tom is talking about a definitive mapping of CPT4 and HCPCS codes to Visit concepts, if they mention them. Once that is available, the ETL indeed could do the deduplication (can’t have the same visit twice a day, can you?)

@Christian_Reich , I think I stated my question poorly, so let me restate.

Now (since January 2023 release) that certain CPT4 and HCPCS codes are officially part of the Visit domain (instead of Procedure domain), where should ETL land those data? For example, for a 99202 CPT4 code (new patient office visit), we used to create a record in the procedure_occurrence table when that CPT4 code was part of the procedure domain). That way we could build cohorts and do analyses about specific CPT codes as needed (such as when they are part of a quality measure definition).

If the recommendation is to no longer store those CPT codes in the procedure table, I’m not sure where else they could be stored and also be accessible via Atlas. They are not valid codes for visit_concept_id. And, if they were added as visit_detail, Atlas doesn’t enable direct search of visit_detail.

So, I’d advocate for continuing to have those CPT codes generate records in the procedure_occurrence table. However, that could lead to additional confusion for both ETL-ers and end-users as long as those CPT codes live in the Visit domain.

So, the bottom line is that I want to ensure I can use Atlas to define cohorts to query for specific CPT codes. This was possible when they were procedures and we had access to the standard CPT hierarchy. Now that selected CPT codes have been moved to the Visit domain, it is not clear where those CPT codes should land in CDM tables so that we retain provenance plus the ability to query them via Atlas.

I hope that makes more sense.

Lots of US EHR data holders:

And regarding deduplication, unfortunately, it’s not so easy and next to impossible for some EHR data sources. The source visits/encounters are not always linked to the CPT4 “billing code”.

And now these CPT4 codes are no longer standard :frowning: So, they can’t be used for network queries :frowning: And many map to generic “office visit”, which doesn’t give the level of detail necessary to meet @Thomas_White use case & other’s use cases as we discussed this am on the HSIG call.

Looking at these CPT4 ‘visit’ codes. They do identify a visit, however, it is the attributes located in the description of the code which are most useful to the use cases described. And I would argue these attributes, “new patient office visit”, “treatment variability”, etc. are “Observations” and also belong in the Observation table.

Hm. Interesting debate. Just to make it clear upfront: It happens not because our model is flawed, but the frigging CPT4 and HCPCS codes are a mess of anything that could be used to justify payment. And lots of folks have become addicted to using them and interpreting them very narrowly. Of course, none of that makes sense from outside the US. Even having them as standard concepts.

Which is obviously a violation of the CDM, as “new patient office visit” is not a procedure, but - an office visit. But I would like to understand better your use case problems:

If they are mapped to Visit, do you care whether they whether the information was originated as CPT4 or from other information in the source? Why?

That needs to be fixed. However, VISIT_DETAIL is mostly relevant to inpatient visits, is it not? An office or ER visit, which is what most of these concepts represent, should be in VISIT_OCCURRENCE, no?

Why not? If you have an EHR that indicates an outpatient office visit how is that different from using the CPT4 code? And why can’t you go through day by day in the life of the patient and make sure there is only one per day? With some exception for certain specialties?

This is the way the EHR is designed. We have “billing” data (CPT4 codes) and “encounter” data (giant, unwieldy, ambiguous tables to run the business of seeing patients). Two separate things that aren’t linked. It’s messy stuff. And we do our best to de-duplicate or merge all that we can, but sometimes it’s not possible. Visits are especially hard because the EHR contains many visits which are not patient-provider interactions and there is not a reliable flag. Every time someone at the healthcare system enters something into a person’s electronic chart, it must be linked to an encounter in the person’s chart. And if there isn’t an appropriate encounter to link, then an encounter is created. A person fills out a document, encounter record created. MRI is faxed over from another institution, another encounter. Labs reviewed by the RN, another encounter. Different domains within the EHR differ in granularity and ability to establish a link between billing and encounter data. Messy stuff we can discuss over a beer, or two, possibly 3 beers because it’s a long conversation :slight_smile:

Messy EHR data aside, I’m still arguing for inclusion on these CPT4 codes in the Observation table. Since these are observations:

1 Like

I take the beer offer. :slight_smile:

If there is more than a visit information - why not. I just want to prevent people using the OBSERVATION table instead of the VISIT_OCCURRENCE table to look for visits.

1 Like

In case you missed the discussion on this topic in the CDM WG this morning, it is located here.

if you add them to the observation domain do you use the visit concept id in the observation_concept_id field?

Dear Community,

During the past months, we had a series of debates regarding CPT4/HCPCS concepts that were deStandardized, mapped to the standard Visits, and moved to the respective Domain according to their mapping. During the CDM WG call on May 16th, the WG members came up with a recommendation to rollback all the changes we implemented.

Considering all pros and cons previously discussed the vocabulary team proposes the following:

  1. We will revert the domain changes, so these concepts will be assigned to their original domains (Observation, Procedure, etc.). It’s clear to us that the ETL logic of creating or modifying the existing visits from these concepts would be complex and is not needed in most cases because the visits are already well-shaped from different data sources in both EHR and claims data. Therefore, we need a default landing Domain for these concepts which cannot be a Visit.
  2. Currently, these concepts are mapped to the standard concepts in the Visit domain, and we shall preserve these mappings:
  • They don’t affect the ETL process. If ETLs do not create visit records using CPT4/HCPCS codes they could just ignore them - if corresponding rules are not created it doesn’t hurt. Reversing domains should help keep the Visit table intact.
  • We have at least one use case for using these mappings. The source dataset accommodated data from more than 1000 clinics. It was very chaotic and contained a lot of duplicates and field mismatches. Due to such a structure, it was impossible to use POS-codes for visit type identification, so ETLers used CPT4/HCPCS codes for this purpose.
  1. We will keep these concepts non-standard. Standard concepts must represent valuable clinical entities and may serve as targets for mappings during ETL activities. It is a general rule of OHDSI Vocabularies. Unfortunately, the concepts of matter do not meet these criteria. To the best of our knowledge, they are also not used in studies and we see no reasons why they should be standard. Eg.
concept_code concept_name vocabulary_id
99471 Initial inpatient pediatric critical care, per day, for the evaluation and management of a critically ill infant or young child, 29 days through 24 months of age CPT4
99281 Emergency department visit for the evaluation and management of a patient, which requires these 3 key components: A problem focused history; A problem focused examination; and Straightforward medical decision making. Counseling and/or coordination of c... CPT4

The above-mentioned changes will not affect the concepts that carry additional semantics (i.e. Procedure, Observation, etc), such as Home visit for hemodialysis. The domain of these concepts has already been assigned according to their semantics - could be <Procedure/Observation/Drug/Condition> - and it will be preserved. We also shall preserve their mapping to themselves (if standard) or to < Procedure/Observation/Drug/Condition> + Visit (if non-standard).

We hope this decision will satisfy everyone involved in the discussion.

Masha and the Vocabulary team.

@MPhilofsky @Christian_Reich @clairblacketer @aostropolets @zhuk @Alexdavv

Thanks for the great updates, Vocabulary team. To update your evidence base, so to speak, I wanted to note that we have used these in studies with some frequency. Specifically, there are 2+1 pieces of information the terms encode beyond the existence and setting of the encounter:

  • Presence of an E/M code indicates that a clinician actually interacted with the patient at the visit; this bit of metadata changes interpretation of things like the accuracy of condition_occurrence records and other facts reflective of clinical decision making.

  • The final digit indicates the “complexity” of the visit, a CPT-ish term for how sick the patient was, allowing the study to distinguish coarsely between complications and routine follow-up.

  • (As an added bonus, the critical care codes such as 9947x and 9929x are in administrative datasets often the only way to ascertain that a patient was receiving ICU-level care. As you might guess, this has been a topic of particular relevance in studying COVID-19.)

I know I’ve encountered these practices outside PEDSnet in EHR-oriented networks like PCORnet and NIH-RECOVER. I’d be curious to hear whether anyone else does something similar.

Charles Bailey

1 Like

Hi Vocabulary Team,

I wanted to add another example of how these CPT codes contain useful information that isn’t found elsewhere - some researchers at Stanford use these codes to differentiate office/outpatient visits for new vs. established patients. We would appreciate being able to easily access the specific CPT codes in the non-visit tables.


I really like the idea of grouping concepts by services provided or visits, and effort of the OHDSI Vocabulary team. The problem lays in the variety of meanings of the CPT and HCPCS codes affected, which can’t be simply replaced by visit codes.
Even the aforementioned example of “non-sense” CPT4 code

might be used in some cohort definitions to determine the severity of patient: if you look at codes 99281 - 99285, the 99281 stands for case when physician is not required (very simple case),
and 99285 stands for “high level of medical decision making”(complicated case), and 99282 - 99284 are in between.

And as I mentioned in the Proposed changes in SNOMED domains - #17 by Dymshyts, the problem is in subjectivity of decision.

How it was decided that information is non-significant?

Please see the attached table with CPT, HCPCS concepts mapped to visits (if they are mapped to something else, mapping is shown as well), ordered by number of occurrences in the our network.
The overall problem is that potentially important information is lost. See rows in yellow and comments.
I didn’t review the full list though, I believe there will be more of such cases.
mapping_to_visit.xlsx (79.4 KB)

Proposed solution: instead of having ‘Maps to’ relationship which makes source codes non-standard, and not usable in OMOP CDM properly, create, let’s say, ‘Has related visit’ relationship, so the ETL can create visits out of these CPT/HCPCS concepts, but be able to preserve original concept as standard; and replace ‘Maps to’ to non-visit concepts with ‘Is a’ relationships.

And I think it’s a very good way of OHDSI vocabulary maturing, when obvious improvement (let’s derive visit information), meets some obstacles, and more round-up solution should be created.

here’s another example of important concept used in our cohort definitions, that now is mapped just to ‘Telehealth’:
Interrogation device evaluation(s), (remote) up to 30 days; implantable cardiovascular physiologic monitor system, implantable loop recorder system, or subcutaneous cardiac rhythm monitor system, remote data acquisition(s), receipt of tran… (Deprecated) | HCPCS | G2066