OHDSI Home | Forums | Wiki | Github

Vocab V5 cpt4.jar truncates last line in CONCEPT.csv?

I downloaded the V5 vocabulary yesterday, ran the cpt4 utility to augment CONCEPT.csv, then tried to load it to a postgres database, and got this error:

psql:vocab_load_local.sql:51: ERROR:  missing data for column "domain_id"
CONTEXT:  COPY concept, line 2079654: "42740582	Statin therapy, prescr"

I copied all but the last short line to another file and it loaded fine.

Thanks,
Don

I’ve had the same problem. It appears to affect the last N lines written to the output file, so I was able to work around by coping the top 50 or so lines form concept_cpt4.csv down to the bottom to pad it out, then deleting duplicate lines (and the short line, of course) from the output.

@bailey, @donohara and Friends:

We added full error reporting to the CPT4 utility. Can you be so kind and download and run it again? And post the nagging it will produce.

Downloaded this AM. Here’s what I see:

baileyc@Lapchop (0) zips/test $ uname -a
Darwin 985aeb8a0efe 14.4.0 Darwin Kernel Version 14.4.0: Thu May 28 11:35:04 PDT 2015; root:xnu-2782.30.5~1/RELEASE_X86_64 x86_64
baileyc@Lapchop (0) zips/test $ java -version
java version “1.8.0_45”
Java™ SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot™ 64-Bit Server VM (build 25.45-b02, mixed mode)
baileyc@Lapchop (0) zips/test $ java -jar cpt4.jar
java.io.FileNotFoundException: ./concept_cpt4.csv (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at java.io.FileReader.(FileReader.java:58)
at org.odhsi.Main.main(Main.java:32)

CPT successfully updated.
baileyc@Lapchop (0) zips/test $ perl -e ‘rename($, lc $) for @ARGV’ [A-Z]*.csv
baileyc@Lapchop (0) zips/test $ java -jar cpt4.jar
Records imported: 14202
CPT successfully updated.
baileyc@Lapchop (0) zips/test $ wc -l concept.csv concept_cpt4.csv
15187 concept.csv
14203 concept_cpt4.csv
29390 total
baileyc@Lapchop (0) zips/test $ tail -2 concept_cpt4.csv
2101858 Procedure CPT4 CPT4 0H 19700101 20130508 D
42741111 Procedure CPT4 CPT4 0150T 20070703 20100403 D
baileyc@Lapchop (0) zips/test $ tail -2 concept.csv
2102317 Acellular dermal replacement, face, scalp, eyelids, mouth, neck, ears, orbits, genitalia, hands, feet, and/or multiple digits; first 100 sq cm or less, or one percent of body area of infants and children Procedure CPT4 CPT4 15175 20060213 20120507 D
42734828 Percutaneous placement of gastrostomy tube, radiological supervision and interpretation Procedure CPT4 CPT4 74350 19700101 20080401
baileyc@Lapchop (0) zips/test $ grep 42741111 concept.csv
baileyc@Lapchop (0) zips/test $ grep -c CPT4 concept.csv
14182

So looks like the last chunk of output is still vanishing silently. Happy to run additional diagnostics as needed.

Thanks!

Issue is fixed, new version is available for download. Also, big thanks to Charles Bailey for crucial info, it really helped o track down this sneaky bug.

Tested here, and at least writes a complete output file. Thanks!

What turned out to be the bug?

Is it possible for us to have access to the source for cpt4.jar?

t