Understanding How cp4.jar Updates OMOP Vocabulary Files

banjawd · June 19, 2025, 3:09am

I’m new to working with OMOP and CPT4 vocabularies, and I’m currently using the cpt4.jar file to download the vocabulary data.

Could someone help me understand how this process works behind the scenes? Specifically:

How does the tool update the CONCEPT.csv file?
Which columns are affected during the update?
Is the update based on CONCEPT_ID, or is there another mechanism involved?

Any insight into the update logic would be greatly appreciated. Thank you!

zhuk · June 19, 2025, 11:16am

Hi @banjawd, welcome to the Community

The tool is used to update concept_name in the CONCEPT.csv file. You can check the difference between this file before and after running the tool. We use concept_code to provide the correct name, not the concept_id.

The tool checks if you have a valid API code and can access the CPT4 fully and then updates names accordingly. The need for this tool is caused by legal technicalities - we can’t distribute CPT4, so we have to check if you can access it.

banjawd · June 19, 2025, 1:41pm

Thank you, Zhuk!

Just to confirm my understanding:

The JAR file compares the concept_code from concept.csv with the one in concept_cpt4.csv. If there’s a match, it updates the concept_name in concept.csv, correct?
Also, in concept.csv, does the vocabulary_id need to be 'CPT4' for the update to happen?
And finally, do the concept_id values need to be the same in both files for the update to work?

Is there any official documentation that explains this process in detail?

zhuk · June 26, 2025, 5:09pm

Almost. CPT4 concepts are first isolated in concept_cpt4 file. The application processes each record directly and updates the name of the concept. Finally, the records are appended to the concept

Now to your questions:

the application updates names within concept_cpt4. There is no direct comparison between files.
The vocabulary_id has to be CPT4, but it is processed in a slightly different file.
Unfortunately, there is only a Readme file that comes with the bundle.

Why do you need this detailed logic?