Agreed. It is better to start from a clean sheet in those cases. Now I understand the flow, but as I wrote above it is working. I did a function that collect each tables concepts and create the vocabulary in Python and insert it to the schema I choose to, which seems more like a custom vocabulary reading other questions in the forum.
Again, I think youâre going down the road of âdoing it my wayâ, instead of following standard procedures. Downloading and installing a published version of the vocabulary is a standard part of the process of setting up your CDM, so you should start there, and once you have those tables, you are free to table-copy them as necessary, but youâre going âout of the laneâ if youâre trying to implement your own process of building those tables.
How would you suggest then to do the vocabulary? Because this part was not very clear to me, reason why I went rogue judging your words.
The CDM FAQ has a question related to this:
12. Do I have to map my source codes to Standard Concepts myself? Are there vocabulary mappings that already exist for me to leverage?
If your data use any of the 55 source vocabularies that are currently supported, the mappings have been done for you. The full list is available from the open-source ATHENA tool under the download tab (see below). You can choose to download the ten vocabulary tables from there as well â you will need a copy in your environment if you plan on building a CDM.
Other places to find this is on the OHDSI website under software tools.
Admittedly, I depend on our infrastructure team on inserting this data into our own environment, so I havenât run through this process directlyâŚAs I looked around to try to answer your question on this specific step, it wasnât straight forward, and so maybe we need an article posted somewhere (maybe in the CDM githubio site, where there is an article on how the vocabulary is built.
@clairblacketer (or other CDM repo maintainers), do you think it would be possible to add to the âHow the Vocab is Builtâ article that describes how to download the Vocabulary tables? It seems the article assumes youâve gone through the process by starting off with how concepts are mapped.
I can share my function in Python if this would be any help, I found it a decent piece of code. It will work with any OMOP.
Sounds interesting, Iâm not a maintainer/contributor of the vocabulary team, but someone should be able to engage with you to check it out. there are also Vocabulary working group meetings (you should be able to find the schedule on the ohdsi website).
Not following, @Chris_Knoll: You want an article to be written, but then refer to the (written) article. What is missing, and where is it missing? What question do we need to answer better?
A better guideline on how to create the vocabularies, should I download zip files from Atlas and insert them directly into the vocabulary tables? I could not find this info, just people explaining the depths of it, how many levels, and so on, nothing straightforward. If you have a link please share.
A lot is talked about how to filter the concepts, but not much is talked about how to populate it and its settings.
Personal opinion, I might be wrong.
Sorry, @Christian_Reich , I went through the article and didnât see where they go through the steps of going to athena, downloading the file, getting the program that unpacks the files into the txt, or any example scripts that would import the csvs into various DBMSâŚthat sort of thingâŚthe article I referenced starts with âMapping of Conceptsâ, pre-supposing that you already found the right vocabulary download and imported into your DB. Even if they said ârefer to your DB documentation on loading files into tablesâ, that would be something, but thereâs no reference anywhere (that I could find) that describes the process of getting the vocabulary.
Shouldnât this exist? If so, where is it? And if it does exist, shouldnât the article above reference that information in some way?
I just went through a local DB+ETL setup - one place (maybe the only one?) where the vocab setup steps are documented is in the email you get from Athena with your vocabulary zip file:
Please download and load the Standardized Vocabularies as following:
- Click on this link to download the zip file. Typical file sizes, depending on the number of vocabularies selected, are between 30 and 250 MB.
- Unpack.
- Reconstitute CPT-4. See below for details.
- If needed, create the tables.
- Load the unpacked files into the tables.
Important: All vocabularies are fully represented in the downloaded files with the exception of CPT-4: OHDSI does not have a distribution license to ship CPT-4 codes together with the descriptions. Therefore, we provide you with a utility that downloads the descriptions separately and merges them together with everything else. After unpacking, simply open a command line in the directory you unpacked all the files into and run âjava -Dumls-apikey=xxx -jar cpt4.jar 5â. Please replace âxxxâ with UMLS API KEY.
Scripts for importing the vocabulary csv files into your OMOP CDM vocabulary tables can be found here. They are provided in the respective folders, e.g. Oracle/, PostgreSQL/ and SQL Server/ for supported SQL dialects. The loading scripts are inside the subfolder VocabImport/.
Itâd be easy enough to copy/paste this, along with a bit of preamble around how to use Athena to get the vocabs you need, into existing documentation. Iâm happy to do this if folks agree and can confirm the best place for these instructions to live. @Chris_Knoll @Christian_Reich - thoughts?
In addition, we could also reference resources like ETL-Synthea and Broadsea, which have scripts for various parts of this process.
I think it would be very helpful! Thanks! Hopefully that clarifies to @Christian_Reich what I felt was missing from our vocabulary documentation.
Personally, I think a new article in the CDM github.io site would be good, maybe under the âHow Toâ nav (ie: How to download vocabulary). If thatâs not good, then maybe to the README of the vocabulary github repo.
This content could be also incorporated into the Book of OHDSI, but Iâm a fan of finding this content online.
Ahh, yes I like these ideas. Perhaps we can do best of both worlds - this is a page in the Vocabulary repo docs, and the CDM docs link out to that page. I think it makes sense for it to live in the Vocab repo so that the vocab team can keep it up to date as their processes evolve. Iâll put up a PR
Weâve now updated the Vocab docs to include download, reconstitution, and loading instructions here: General Structure, Download and Use ¡ OHDSI/Vocabulary-v5.0 Wiki ¡ GitHub
@katy-sadowski thanks. I did insert the vocabulary properly after Chris explained and shared the instructions I missed out. Surely this info will prevent other people from facing the same problem as I did.