I am loading concept data into an Oracle database from csv files downloaded from Athena. Most of the data has loaded successfully but there are about 10k records that failed to load. Most of the data that failed to load contains quotes embedded in the records
As you can see from the data, the concept_id is also formatted incorrectly but again, just for these 10k records. Is this a known issue or any recommendations on the easiest way to work around it? Is the data available as a txt file. Note, I am working in a windows environment.
Hello,
thank you for reporting - we will check the same case and inform you about workaround or created/fixed task.
Also anyway I don’t see problem to open a .csv file as a text file in a text editor.
@maria.pozhidaeva, thanks for your response. Yes, I could find an editor that can deal with such large files and update the quotes (") issue but a significant portion of the same data has concept_ids in exponential format (e.g.) 4.0491E7. Slightly more difficult to replace because these IDs are used in other tables.
@ColinOrr, I’ve checked all steps for loading concepts to Oracle db and all looks fine.
My actions:
Downloading from http://athena.ohdsi.org/vocabulary/list SNOMED vocabulary (including concept with concept_id = 40491863)
In CONCEPT.csv I see line (without “;” as separator and exponential notation):
40491863 “Linked to” reference set attribute Metadata SNOMED Model Comp 447257003 20120131 2
@maria.pozhidaeva, thanks for providing those details. It appears that the tool I am using (SQL Developer) is the source of my problems. I was letting the tool create the control file and it was inserting
OPTIONALLY ENCLOSED BY ‘"’ AND ‘"’
this cause the problem both with the embedded quotes and the scientific notation when it reported the problem. I have update the control file removing this option and the data is now loading.