OHDSI Home | Forums | Wiki | Github

What is the recommended way to load vocabulary into Postgres?

Hi folks,

This may be very basic question but what is the recommended way to load vocabulary data from the csv files downloaded from athena.ohdsi.org into a postgres database?

I know there was a sql script created by Lee Evans using ‘copy from’ in 2014. Is there any other/better /recommended way to do it?

Thanks,
Ray Zohdsi

You can use the Data Description Language (DDL) scrips to create the tables correctly (https://github.com/OHDSI/CommonDataModel/tree/main#readme). From there, loading the vocabularies from Athena should be a mere matter of inserting rows on the tables… although the CSV files can be big.

The ETL-Synthea repo (https://github.com/OHDSI/ETL-Synthea) has a function to load those CSV files (LoadVocabFromCsv()), but I wonder if the same could be accomplished with import features from general-purpose SQL clients, e.g. Import/Export Data Dialog — pgAdmin 4 6.7 documentation.

Kind regards.

Thank you so much, Fabrício! This is what I have been looking for.

Ray

I use ETL-Synthea to import and got the warning message:

Warning messages:
1: In data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE,  :
  Found and resolved improper quoting out-of-sample. First healed line 853636: <<2614984        "y set" tubing for peritoneal dialysis  Device  HCPCS   HCPCS   S       A4719   20020101        20991231        >>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
2: In data.table::fread(file = paste0(vocabFileLoc, "/", csv), stringsAsFactors = FALSE,  :
  Found and resolved improper quoting out-of-sample. First healed line 115727: <<44833612       "ventilation" pneumonit 4180186>>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
>

The function data.table::fread() is having trouble with quotes. Disabling them (quote = “”) should fix. The vocabulary files do not use quoting.

Sorry Fabricio,
I thought I knew how to use it but I don’t. How to pass the quote="" to disable them?

Thanks,
Ray Z.

t