OHDSI Home | Forums | Wiki | Github

CPT Hierarchy errors - lost children in 2023 and changed domains

The QR code from the abstract leads to https://gist.github.com/fdefalco/2aca8656804cd1b3618f4a64c5900c88.

@zhuk - thanks for the link to the release notes. Somehow I had never seen them before.

What would really help is an easily searchable lineage that shows the pre and post values. I can see from the Release notes v20230116_major that 1164 CPT codes were deStandardized and mapped over to the Standard concepts in the respective domains. However, I don’t see where I I can find a listing of exactly which CPT codes had attributes changed, and what they changed to.

Is there a database that tracks such changes? I presume the concept attributes might change multiple times over many years. If so, a data model that lets you search for all changes to concepts, relationships, and ancestors over time (flagging vocabulary release date) would enable easier searching and understanding of what changed - plus the ability to do robust impact analysis across versions (both for automating ETL updates and also for automating changes to concept sets or cohorts to account for the those changes).

Does such a database already exist?
Are there web-based tools to navigate the history of changes to concept attributes or relationships?
Are there plans to augment Athena to let people navigate the history of changes of concepts?

Lastly, now the the E&M CPT codes are in the Visit domain, where are records about those CPT codes supposed to land in the OMOP data model? There is no place for them in the visit_occurrence table (since those codes are not valid values for visit_concept_id). Should the CPT E&M codes continue to generate records in the Procedure table? The Observation table? Other?

That sort of information would also be helpful in the release notes - especially when the domain for certain codes change and there might be confusion about what target table they should land in.

This theme has been debated for years now and there certainly are signs of improvement in the situation (‘What’s new’ section in release notes, numbers of changed concepts, extended descriptions, etc.). The changes in vocabularies like CPT4 and HCPCS are not usually big, because the vocabularies themselves are small, but for Snomed, and drug vocabularies, such as RxNorm and RxNorm Extension, changes are enormously huge, often more than 10K concepts. Therefore it is not easy to store them in the GitHub Release Notes section.

Within the Vocabulary team, we use audit package to track changes within the database. It writes all the changes for concepts in a log table. Unfortunately, it only writes changes done by scripts (INSERT, UPDATE) and does not perfectly suitable for your use case, when the vocabularies are downloaded and updated manually. However, you can try to adjust it.

For your use case, usage of scripts, that show differences between vocabulary versions may be used. For example, before downloading new vocabularies, you create backup tables with the previous version in a separate schema and then compare 2 versions table by table with help of custom-made scripts or our scripts

Regarding CPT4 codes and Visit domain. I am not sure that I understand the problem. Why not Visit table? If you have visits constructed from some other codes, you could do deduplication during ETL.

I think Tom is talking about a definitive mapping of CPT4 and HCPCS codes to Visit concepts, if they mention them. Once that is available, the ETL indeed could do the deduplication (can’t have the same visit twice a day, can you?)

@Christian_Reich , I think I stated my question poorly, so let me restate.

Now (since January 2023 release) that certain CPT4 and HCPCS codes are officially part of the Visit domain (instead of Procedure domain), where should ETL land those data? For example, for a 99202 CPT4 code (new patient office visit), we used to create a record in the procedure_occurrence table when that CPT4 code was part of the procedure domain). That way we could build cohorts and do analyses about specific CPT codes as needed (such as when they are part of a quality measure definition).

If the recommendation is to no longer store those CPT codes in the procedure table, I’m not sure where else they could be stored and also be accessible via Atlas. They are not valid codes for visit_concept_id. And, if they were added as visit_detail, Atlas doesn’t enable direct search of visit_detail.

So, I’d advocate for continuing to have those CPT codes generate records in the procedure_occurrence table. However, that could lead to additional confusion for both ETL-ers and end-users as long as those CPT codes live in the Visit domain.

So, the bottom line is that I want to ensure I can use Atlas to define cohorts to query for specific CPT codes. This was possible when they were procedures and we had access to the standard CPT hierarchy. Now that selected CPT codes have been moved to the Visit domain, it is not clear where those CPT codes should land in CDM tables so that we retain provenance plus the ability to query them via Atlas.

I hope that makes more sense.

Lots of US EHR data holders:

And regarding deduplication, unfortunately, it’s not so easy and next to impossible for some EHR data sources. The source visits/encounters are not always linked to the CPT4 “billing code”.

And now these CPT4 codes are no longer standard :frowning: So, they can’t be used for network queries :frowning: And many map to generic “office visit”, which doesn’t give the level of detail necessary to meet @Thomas_White use case & other’s use cases as we discussed this am on the HSIG call.

Looking at these CPT4 ‘visit’ codes. They do identify a visit, however, it is the attributes located in the description of the code which are most useful to the use cases described. And I would argue these attributes, “new patient office visit”, “treatment variability”, etc. are “Observations” and also belong in the Observation table.

Hm. Interesting debate. Just to make it clear upfront: It happens not because our model is flawed, but the frigging CPT4 and HCPCS codes are a mess of anything that could be used to justify payment. And lots of folks have become addicted to using them and interpreting them very narrowly. Of course, none of that makes sense from outside the US. Even having them as standard concepts.

Which is obviously a violation of the CDM, as “new patient office visit” is not a procedure, but - an office visit. But I would like to understand better your use case problems:

If they are mapped to Visit, do you care whether they whether the information was originated as CPT4 or from other information in the source? Why?

That needs to be fixed. However, VISIT_DETAIL is mostly relevant to inpatient visits, is it not? An office or ER visit, which is what most of these concepts represent, should be in VISIT_OCCURRENCE, no?

Why not? If you have an EHR that indicates an outpatient office visit how is that different from using the CPT4 code? And why can’t you go through day by day in the life of the patient and make sure there is only one per day? With some exception for certain specialties?

This is the way the EHR is designed. We have “billing” data (CPT4 codes) and “encounter” data (giant, unwieldy, ambiguous tables to run the business of seeing patients). Two separate things that aren’t linked. It’s messy stuff. And we do our best to de-duplicate or merge all that we can, but sometimes it’s not possible. Visits are especially hard because the EHR contains many visits which are not patient-provider interactions and there is not a reliable flag. Every time someone at the healthcare system enters something into a person’s electronic chart, it must be linked to an encounter in the person’s chart. And if there isn’t an appropriate encounter to link, then an encounter is created. A person fills out a document, encounter record created. MRI is faxed over from another institution, another encounter. Labs reviewed by the RN, another encounter. Different domains within the EHR differ in granularity and ability to establish a link between billing and encounter data. Messy stuff we can discuss over a beer, or two, possibly 3 beers because it’s a long conversation :slight_smile:

Messy EHR data aside, I’m still arguing for inclusion on these CPT4 codes in the Observation table. Since these are observations:

1 Like

I take the beer offer. :slight_smile:

If there is more than a visit information - why not. I just want to prevent people using the OBSERVATION table instead of the VISIT_OCCURRENCE table to look for visits.

1 Like

In case you missed the discussion on this topic in the CDM WG this morning, it is located here.

if you add them to the observation domain do you use the visit concept id in the observation_concept_id field?

Dear Community,

During the past months, we had a series of debates regarding CPT4/HCPCS concepts that were deStandardized, mapped to the standard Visits, and moved to the respective Domain according to their mapping. During the CDM WG call on May 16th, the WG members came up with a recommendation to rollback all the changes we implemented.

Considering all pros and cons previously discussed the vocabulary team proposes the following:

  1. We will revert the domain changes, so these concepts will be assigned to their original domains (Observation, Procedure, etc.). It’s clear to us that the ETL logic of creating or modifying the existing visits from these concepts would be complex and is not needed in most cases because the visits are already well-shaped from different data sources in both EHR and claims data. Therefore, we need a default landing Domain for these concepts which cannot be a Visit.
  2. Currently, these concepts are mapped to the standard concepts in the Visit domain, and we shall preserve these mappings:
  • They don’t affect the ETL process. If ETLs do not create visit records using CPT4/HCPCS codes they could just ignore them - if corresponding rules are not created it doesn’t hurt. Reversing domains should help keep the Visit table intact.
  • We have at least one use case for using these mappings. The source dataset accommodated data from more than 1000 clinics. It was very chaotic and contained a lot of duplicates and field mismatches. Due to such a structure, it was impossible to use POS-codes for visit type identification, so ETLers used CPT4/HCPCS codes for this purpose.
  1. We will keep these concepts non-standard. Standard concepts must represent valuable clinical entities and may serve as targets for mappings during ETL activities. It is a general rule of OHDSI Vocabularies. Unfortunately, the concepts of matter do not meet these criteria. To the best of our knowledge, they are also not used in studies and we see no reasons why they should be standard. Eg.
concept_code concept_name vocabulary_id
99471 Initial inpatient pediatric critical care, per day, for the evaluation and management of a critically ill infant or young child, 29 days through 24 months of age CPT4
99281 Emergency department visit for the evaluation and management of a patient, which requires these 3 key components: A problem focused history; A problem focused examination; and Straightforward medical decision making. Counseling and/or coordination of c... CPT4

The above-mentioned changes will not affect the concepts that carry additional semantics (i.e. Procedure, Observation, etc), such as Home visit for hemodialysis. The domain of these concepts has already been assigned according to their semantics - could be <Procedure/Observation/Drug/Condition> - and it will be preserved. We also shall preserve their mapping to themselves (if standard) or to < Procedure/Observation/Drug/Condition> + Visit (if non-standard).

We hope this decision will satisfy everyone involved in the discussion.

Masha and the Vocabulary team.

@MPhilofsky @Christian_Reich @clairblacketer @aostropolets @zhuk @Alexdavv

Thanks for the great updates, Vocabulary team. To update your evidence base, so to speak, I wanted to note that we have used these in studies with some frequency. Specifically, there are 2+1 pieces of information the terms encode beyond the existence and setting of the encounter:

  • Presence of an E/M code indicates that a clinician actually interacted with the patient at the visit; this bit of metadata changes interpretation of things like the accuracy of condition_occurrence records and other facts reflective of clinical decision making.

  • The final digit indicates the “complexity” of the visit, a CPT-ish term for how sick the patient was, allowing the study to distinguish coarsely between complications and routine follow-up.

  • (As an added bonus, the critical care codes such as 9947x and 9929x are in administrative datasets often the only way to ascertain that a patient was receiving ICU-level care. As you might guess, this has been a topic of particular relevance in studying COVID-19.)

I know I’ve encountered these practices outside PEDSnet in EHR-oriented networks like PCORnet and NIH-RECOVER. I’d be curious to hear whether anyone else does something similar.

Charles Bailey

1 Like

Hi Vocabulary Team,

I wanted to add another example of how these CPT codes contain useful information that isn’t found elsewhere - some researchers at Stanford use these codes to differentiate office/outpatient visits for new vs. established patients. We would appreciate being able to easily access the specific CPT codes in the non-visit tables.


I really like the idea of grouping concepts by services provided or visits, and effort of the OHDSI Vocabulary team. The problem lays in the variety of meanings of the CPT and HCPCS codes affected, which can’t be simply replaced by visit codes.
Even the aforementioned example of “non-sense” CPT4 code

might be used in some cohort definitions to determine the severity of patient: if you look at codes 99281 - 99285, the 99281 stands for case when physician is not required (very simple case),
and 99285 stands for “high level of medical decision making”(complicated case), and 99282 - 99284 are in between.

And as I mentioned in the Proposed changes in SNOMED domains - #17 by Dymshyts, the problem is in subjectivity of decision.

How it was decided that information is non-significant?

Please see the attached table with CPT, HCPCS concepts mapped to visits (if they are mapped to something else, mapping is shown as well), ordered by number of occurrences in the our network.
The overall problem is that potentially important information is lost. See rows in yellow and comments.
I didn’t review the full list though, I believe there will be more of such cases.
mapping_to_visit.xlsx (79.4 KB)

Proposed solution: instead of having ‘Maps to’ relationship which makes source codes non-standard, and not usable in OMOP CDM properly, create, let’s say, ‘Has related visit’ relationship, so the ETL can create visits out of these CPT/HCPCS concepts, but be able to preserve original concept as standard; and replace ‘Maps to’ to non-visit concepts with ‘Is a’ relationships.

And I think it’s a very good way of OHDSI vocabulary maturing, when obvious improvement (let’s derive visit information), meets some obstacles, and more round-up solution should be created.

here’s another example of important concept used in our cohort definitions, that now is mapped just to ‘Telehealth’:
Interrogation device evaluation(s), (remote) up to 30 days; implantable cardiovascular physiologic monitor system, implantable loop recorder system, or subcutaneous cardiac rhythm monitor system, remote data acquisition(s), receipt of tran… (Deprecated) | HCPCS | G2066

Hi, @Dymshyts et al!

I’m really sorry that we’ve been keeping all these CPT4 concepts in the Standard area for that long, so the change cost became that big.

Nice to hear it, Dima!

The problem lies much more on the surface: these codes have highly variable meanings even within a code. And how the given user interprets them depends on many factors including the specific knowledge of the vocabulary and coding rules, locally established practices, and the ability for mental gymnastics with multiple AND/OR statements and logical reasoning.

Users can’t do that. We don’t want users to do that. And this is not what the Standard concepts are supposed to be.

We keep working on the vocabulary improvement project and we’ll release vocabulary principles later this year (link1, link2). In line with the principles we keep improving the vocabulary cleanliness, and this case (among others, like negative information, Survey data) is one of the most characteristic examples to look at.

But right, if users find this information useful, we need to find a proper solution to Standardize it.

How about the following?

  1. We ask folks who know and use CPT4/HCPCS in research to boil them down and extract meaningful information from them. Not the entire CPT4/HCPCS pool - only those that don’t qualify the criteria for Standard concepts (those mapped to Visits and other deStandardized concepts). Dima already started this work by identifying those that are used in cohorts or otherwise characterize the patients.
  2. We look at the meaningful information together and decide what the proper Domains should be. Could be the Visit, Observation, Modifier? or something else.
  3. Then we find a way to record this information in the proper Domain, or make an exclusion. The solution could trigger the creation of new more specific concepts within the existing hierarchies of Visit, or the Visits of new dimensions, or new Observation concepts in the OMOP Extension.
  4. We come up with a guide for ETL, including the approach for merging the Visits coming from different sources, if needed.
  5. Since CPT4/HCPCS is on the roadmap the vocabulary team can help with 3, 2, and partially 4, but we still need your help with 1 and 4.
  6. In the meantime, given that the problem has persisted for more than 1.5 years (4 latest vocabulary releases) and the release cycle is 6 months, we try to focus on the solution space to get a proper approach implemented by the nearest release in August as opposed to make a rollback/shortcut solution in the nearest release and postpone the proper solution for the future.


Some of the discussion above about the “ambiguity” of a CPT code is based on faulty information about what a CPT code represents. E&M codes describe the “complexity” of the “procedure” in this case the visit NOT the complexity of the patient. So the use case described above about trying to use the higher E&M codes to infer patient complexity is not really sound. On the other hand the idea that the description of a “complex ED visit” is ambiguous is not correct either. The code indicates that the work involved to care for this patient reached a “complex” level related to other visits. It has nothing to do with underlying “complexity” of the patient. I agree with the various arguments above about both the differences between an “encounter” in the EHR vs a “visit” in CPT they are overlapping but impart different information. But there are many use cases where understanding the information contained in “visit” data is very helpful - the ICU use case is an example. Preventive services is another example. An actual clinical visit as has been mentioned is another one. Moving them into the “visit” table has not worked well as many have noted above. Putting them in the polyglot observation table bypasses the visit problem but continues to deposit CPT codes across the CDM and is an attempt to extract secondary meaning from a terminology designed to collect the work and actions being conducted with and to a person. Moving vaccine administration to the drug table is an example of this secondary interpretation. CPT should be used for what they are- a procedure that has been completed - not a way to impute other information.

Catching up on this convo,and not sure if it was true in Feb 2023, but at least as of 2.14, you can query visit detail:

They are actually very well defined. We have to get those definitions. Not that expensive.

That is true. But it isn’t the point to characterize the patient. The point is to project the complexity onto the primary diagnosis of that visit and infer severity. Sounds reasonable to me. The question is how to properly represented the “lots of work up” in the OMOP CDM. Taking the CPT4 as is and shoving it into an Observation is the poor man’s solution, putting the onus on the analyst to make all those connections.

Except in this case there is no procedure. “Lots of paperwork, working the phone and spending time on the patient” is not a procedure by OMOP standards. Actually, less than 50% of CPTs are procedures.

Great. Thanks, @Chris_Knoll.