Atlas not finding the vocabulary to initiate

RaphaelNogueira · August 23, 2023, 2:53pm

Agreed. It is better to start from a clean sheet in those cases. Now I understand the flow, but as I wrote above it is working. I did a function that collect each tables concepts and create the vocabulary in Python and insert it to the schema I choose to, which seems more like a custom vocabulary reading other questions in the forum.

Chris_Knoll · August 23, 2023, 2:56pm

Again, I think you’re going down the road of ‘doing it my way’, instead of following standard procedures. Downloading and installing a published version of the vocabulary is a standard part of the process of setting up your CDM, so you should start there, and once you have those tables, you are free to table-copy them as necessary, but you’re going ‘out of the lane’ if you’re trying to implement your own process of building those tables.

RaphaelNogueira · August 23, 2023, 2:59pm

How would you suggest then to do the vocabulary? Because this part was not very clear to me, reason why I went rogue judging your words.

Chris_Knoll · August 23, 2023, 3:25pm

The CDM FAQ has a question related to this:

12. Do I have to map my source codes to Standard Concepts myself? Are there vocabulary mappings that already exist for me to leverage?

If your data use any of the 55 source vocabularies that are currently supported, the mappings have been done for you. The full list is available from the open-source ATHENA tool under the download tab (see below). You can choose to download the ten vocabulary tables from there as well – you will need a copy in your environment if you plan on building a CDM.

Other places to find this is on the OHDSI website under software tools.

Admittedly, I depend on our infrastructure team on inserting this data into our own environment, so I haven’t run through this process directly…As I looked around to try to answer your question on this specific step, it wasn’t straight forward, and so maybe we need an article posted somewhere (maybe in the CDM githubio site, where there is an article on how the vocabulary is built.

@clairblacketer (or other CDM repo maintainers), do you think it would be possible to add to the ‘How the Vocab is Built’ article that describes how to download the Vocabulary tables? It seems the article assumes you’ve gone through the process by starting off with how concepts are mapped.

RaphaelNogueira · August 23, 2023, 3:28pm

I can share my function in Python if this would be any help, I found it a decent piece of code. It will work with any OMOP.

Chris_Knoll · August 23, 2023, 3:37pm

Sounds interesting, I’m not a maintainer/contributor of the vocabulary team, but someone should be able to engage with you to check it out. there are also Vocabulary working group meetings (you should be able to find the schedule on the ohdsi website).

Christian_Reich · August 24, 2023, 5:03am

Not following, @Chris_Knoll: You want an article to be written, but then refer to the (written) article. What is missing, and where is it missing? What question do we need to answer better?

RaphaelNogueira · August 24, 2023, 8:13am

A better guideline on how to create the vocabularies, should I download zip files from Atlas and insert them directly into the vocabulary tables? I could not find this info, just people explaining the depths of it, how many levels, and so on, nothing straightforward. If you have a link please share.
A lot is talked about how to filter the concepts, but not much is talked about how to populate it and its settings.
Personal opinion, I might be wrong.

Chris_Knoll · August 24, 2023, 1:05pm

Sorry, @Christian_Reich , I went through the article and didn’t see where they go through the steps of going to athena, downloading the file, getting the program that unpacks the files into the txt, or any example scripts that would import the csvs into various DBMS…that sort of thing…the article I referenced starts with “Mapping of Concepts”, pre-supposing that you already found the right vocabulary download and imported into your DB. Even if they said ‘refer to your DB documentation on loading files into tables’, that would be something, but there’s no reference anywhere (that I could find) that describes the process of getting the vocabulary.

Shouldn’t this exist? If so, where is it? And if it does exist, shouldn’t the article above reference that information in some way?

katy-sadowski · August 28, 2023, 11:12am

I just went through a local DB+ETL setup - one place (maybe the only one?) where the vocab setup steps are documented is in the email you get from Athena with your vocabulary zip file:

Please download and load the Standardized Vocabularies as following:

Click on this link to download the zip file. Typical file sizes, depending on the number of vocabularies selected, are between 30 and 250 MB.

Unpack.

Reconstitute CPT-4. See below for details.

If needed, create the tables.

Load the unpacked files into the tables.

Important: All vocabularies are fully represented in the downloaded files with the exception of CPT-4: OHDSI does not have a distribution license to ship CPT-4 codes together with the descriptions. Therefore, we provide you with a utility that downloads the descriptions separately and merges them together with everything else. After unpacking, simply open a command line in the directory you unpacked all the files into and run “java -Dumls-apikey=xxx -jar cpt4.jar 5”. Please replace “xxx” with UMLS API KEY.

Scripts for importing the vocabulary csv files into your OMOP CDM vocabulary tables can be found here. They are provided in the respective folders, e.g. Oracle/, PostgreSQL/ and SQL Server/ for supported SQL dialects. The loading scripts are inside the subfolder VocabImport/.

It’d be easy enough to copy/paste this, along with a bit of preamble around how to use Athena to get the vocabs you need, into existing documentation. I’m happy to do this if folks agree and can confirm the best place for these instructions to live. @Chris_Knoll @Christian_Reich - thoughts?

In addition, we could also reference resources like ETL-Synthea and Broadsea, which have scripts for various parts of this process.

Chris_Knoll · August 28, 2023, 2:23pm

I think it would be very helpful! Thanks! Hopefully that clarifies to @Christian_Reich what I felt was missing from our vocabulary documentation.

katy-sadowski · August 29, 2023, 11:56am

OK! Where do you think this should be added? Book of OHDSI (maybe Implement the ETL section)?

Chris_Knoll · August 29, 2023, 4:53pm

Personally, I think a new article in the CDM github.io site would be good, maybe under the ‘How To’ nav (ie: How to download vocabulary). If that’s not good, then maybe to the README of the vocabulary github repo.

This content could be also incorporated into the Book of OHDSI, but I’m a fan of finding this content online.

katy-sadowski · August 29, 2023, 8:06pm

Ahh, yes I like these ideas. Perhaps we can do best of both worlds - this is a page in the Vocabulary repo docs, and the CDM docs link out to that page. I think it makes sense for it to live in the Vocab repo so that the vocab team can keep it up to date as their processes evolve. I’ll put up a PR

katy-sadowski · September 13, 2023, 7:01pm

We’ve now updated the Vocab docs to include download, reconstitution, and loading instructions here: General Structure, Download and Use · OHDSI/Vocabulary-v5.0 Wiki · GitHub

RaphaelNogueira · September 13, 2023, 7:04pm

@katy-sadowski thanks. I did insert the vocabulary properly after Chris explained and shared the instructions I missed out. Surely this info will prevent other people from facing the same problem as I did.