OHDSI Home | Forums | Wiki | Github

Newbie in OMOP: How to import Vocabularies in PostgreSQL

Hello everyone, sorry to bother you, I am a university student and for my Master’s thesis I am working on implementing an ETL process in OMOP for the management of biomedical ECG signal records.

Unfortunately I am having some difficulties with the implementation of vocabularies in a PostgreSQL database, I have followed the directions given here but there are some steps that I am unclear about, in particular regarding the management of devV5 and prodV5 schemas.

I am currently struggling with the creation of the UMLS schema, as I need to use the sources.load_input_table() function that refers to the devv5 schema and the devv5.config$ relation that I am unable to find.

If anyone would be so kind as to give me some suggestions on this or if you could point me to another guide to follow, I would be very grateful.

Hello, @AlessandroCarotenuto

You pick one great topic for your Master’s thesis. We would really like to hear about it as soon as it’s done, because biomedical ECG signal records are not easy to tackle :slight_smile: . Check this and this topics (there are more).

Regarding UMLS - you need UMLS credentials to work with CPT4 vocabulary. Most vocabularies are available for free without any license. So in your case maybe you don’t even need UMLS. Once again - maybe.

You don’t need to repeat the whole process of vocabulary creation on your side. Vocabularies can be dowloaded from Athena. If I were you, I would start with the Book of OHDSI. I put a link to ETL chapter, but please don’t skip the first chapters, they are very important and interesting to read.

Hello, @zhuk
Thank you very much for your reply and your advice. Currently I had already checked the two threads you pointed me to start contextualising the scenario, they were very interesting to start understanding what the current progress is regarding ECG biosignals.

Concerning the implementation, I’ve read chapter 6 of The book of OHDSI and downloaded the .csv files from Athena, I was trying to figure out how to import them into a Postgres-based RDBMS to be able to write queries and mappings, this point is unfortunately still not clear to me.

I will read more carefully the materials you turned me to and try to follow the ETL-Synthea example. If you happen to have other suggestions on how to include vocabularies in the DB, that would be great.

Check EHDEN Academy

And I think you are looking for this link

1 Like

Thank you again, I will post future updates here and let you know if the process was successful.

@AlessandroCarotenuto,

The part that’s missing from CommonDataModel Github repo is the script with COPY commands to load the data. First step is to create vocabulary tables using these scripts, second step is to run COPY commands based on this example that used to be available in the repository but was removed before 5.4.0 release.

Hi @rookie_crewkie,
Thanks for the help, searching through the different repos and versions I had also found these scripts, however I encountered some drawbacks due to referential integrity constraints between the different tables and the fact that the .csv files have many rows and several columns so the import process is not so smooth.

Also, the files have [tab] as a delimiter so this is not always recognised. I will make further attempts and in case I will let you know if I can include the vocabularies.

Thanks again

In the end I followed both methods to try to include the cdm with Vocabularies in Postgres.

With the method that @rookie_crewkie suggested, I was able to do everything although it took longer and I had to do everything “by hand”.

Instead, following @zhuk’s advice and the ETL-Synthea example I was able to create the connection in Rstudio, create the tables and load the vocabularies but I noticed that the script for indexes and referential integrity constraints is missing. I tried running one by searching through the repo folders but it is not updated to version 5.4.1 there are only 5.3.1 and 6.0.0. In any case using the last two scripts in the repo that @rookie_crewkie suggested could get around this.

Thanks again for the help

t