OHDSI Home | Forums | Wiki | Github

Athena CPT script failure

Greetings,

When I run the CPT.BAT file against the expanded Athena CSV files I’m suddenly getting thousands of failures in the log file. Here is one such failed update that I ran on a remote workstation.

[INFO ] 2022-12-20 14:32:26.155 [pool-2-thread-12] ConceptUpdater - Update status:FAIL. The code value for id 2102761 is []

When I look for that concept_id in the resulting concept table it is not present.

select count(1) from vocab.concept where concept_id = 2102761;

count|
-----+
0|

Lines such as the following that do update successfully are, of course, present…

[INFO ] 2022-12-20 14:32:26.186 [pool-2-thread-4] ConceptUpdater - Update status:SUCCESS. The code value for id 42733540 is [Cryosurgery of rectal tumor; malignant (Deprecated)]

select count(1) from vocab.concept where concept_id = 42733540;

count|
-----+
1|

Does anyone know why some are failing and some are not?

I ran this again on my local machine and still get 12392 “Not Processed” concepts.

[INFO ] 2022-12-20 16:14:45.392 [main] ConceptService - Writing updated data to CONCEPT.csv
Updated CPT4 records: 16632/16632
[INFO ] 2022-12-20 17:05:35.063 [main] ConceptService - Not processed cpt4 concepts: 12392. See logs/not-processed-concepts-12-20-2022-16-14-44.out, file. You can find more information about errors in the logs/logf

What would cause all these to fail?

Thanks,
Jeff

Other forum posts said to simply re-run the script until it works. Kinda pathetic, but that finally worked. Really defeats any attempt to automate the process though. We need a REAL fix!

Yes, not good, but I believe the core of the problem is that there’s self-referenced foreign keys being used in the CONCEPT table, so the order of inserts matters, an I am guessing that when the partial load works the first run, those concepts inserted will allow subsequent failures to pass.

I think the process of loading vocabulary should be to drop constraints (or ‘suspend them’ if your DBMS supports that) and do the load. After loading, re-enable constraints.

@Jeff_Stroup: As @mik pointed out in this issue we are running up against the UMLS API. UMLS is a service by the NLM, which part of the NIH. There is a “.gov” extension at the domain name, meaning, we are the tail, they are the dog.

We will fix it. Till then we need to copy with the frustration. Sorry.

bumping this up as with the new vocabulary release, people will probably run into this issue again. The current workaround is to run the CPT4 command multiple times until it has resolved all concepts. Not a great workaround, but this is what you can do right now until we have adjusted the tool to work better with the UMLS API.

1 Like

we may have another problem… it appears that again the API does not provide all codes to us (388 missing). We will investigate and figure out a fix.

We have a patched cpt4.jar file in the download now. If you have trouble reconstituting CPT4 after your vocabulary download, please create a new package and download it. This is not the final solution, but a preliminary patch. We hope to be able to build a better solution with support from the UMLS API team. Please be aware that the reconstitution process currently is pretty slow and takes up to 2 hours runtime.

2 Likes

@mik
We were having the same issue- infact dropped a few thousand CPT codes. What does the current fix do?

@PriyaDesai , we have yet another fix in the works. It slows down the API calls and is currently being tested. It might require a bit of running time (maybe up to 4.5 hours), but unless UMLS provides some guidance on a better solution…
Bear a little with us. You will hopefully find an update in this post soon.

t