OHDSI Home | Forums | Wiki | Github

Release of new Vocabulary and Download Page

Friends:

Please take a look at the new vocabulary we built from scratch. The building process has a new name: Athena, which according to Patrick stands for Automated Terminology Harmonization, Extraction and Normalization for Analytics. The vocabulary was built in Version 5.0 from the sources and back-converted to V4.5.

You can now download the vocabulary both in Version 5.0 and Version 4.5 from the new downlaod page at http://ohdsi.org/web/athena.

Please note: The rules of building the vocabulary used to be hard-wired and not documented very explicelty in V4 (nobody to blame but ourselves). We now rebuilt everything right in V5. We might have missed something. Therefore, I am only sending the announcement to this group. After a little bit of further scrubbing and maybe some ā€œuser testingā€ by you we will alert everybody. So, donā€™t be surprised if we announce a few updates in the next couple of days. I will keep you up to date.

We also did a whole bunch of fixes and changes. Here is a list of the most relevant ones:

  • Refactoring of all the Drug vocabularies and relationships between them: RxNorm, NDC, SPL, VA Product and Class, NDF-RT and ATC
  • Addition of PCORNet concepts and mappings
  • Addition of LOINC, HCPCS and CPT4 relatinoships
  • For Observations and Measurement, we now map from Source Conceps to Standard Concepts (Maps to) and to Value Concepts (Maps to value). For example, ICD9CM codes for personal history are now mapped to the Observation for personal history, and the actual condition goes into the field value_as_concept. Note that this wonā€™t work in V4.
  • Revised mapping from ICD9CM and ICD10.
  • Revised mapping from Read according to the NHS
  • Revised domain assignment rules for LOINC, Read, ICD9CM, ICD10, SNOMED, HCPCS and CPT4.
  • Tons of little additions to Type Concepts

We havenā€™t quite finished all the homework for CPRD, which is mostly Gemscript. It is happening now. We also want to work with Parsa on a proper information model, which will help quality, and to compare and potentially incorporate CIEL (OpenMRS) and Med (Georgeā€™s shop).

Let me know what you think.

The release that Mark Khayter announced yesterday is not related to this. It is a October V5 on the IMEDS website. All vocabulary releases will happen from here going foward. I apologize for that confusion.

@Christian_Reich and the Vocabulary Team

Ha!!! First glitch in the download page. Embarrassing! Iā€™ll let you know in a minute.

1 Like

All set. We are in production. Phew.

@Christian_Reich - Is this a replacement or an update to V5? Was the ā€œOctober V5ā€ the first official V5, or a draft? I just want to keep my datasets in order.

Thanks,
Don

Argh. The bugs are flying.

Fixed. You can re-download, or just fix yourself:

Version 5.0:
update vocabulary set vocabulary_name=ā€˜OMOP Vocabulary v4.5 21-Mar-2015ā€™ where vocabulary_id=ā€˜Noneā€™;
commit;

Version 4.5:
update vocabulary set vocabulary_name=ā€˜OMOP Vocabulary v4.5 21-Mar-2015ā€™ where vocabulary_id=0;
commit;

I recommend re-downloading. I found a couple of additional issues with the files. The latest files on the vocabulary download website have now been corrected.

VOCABULARY and CONCEPT are refusing to load successfully using the provided load file. I tested on Postgres, but assume that these issues should be affecting other DBs.

VOCABULARY.csv

  • Extra column on several entries (SNOMED, LOINCā€¦), ā€œlatest_updateā€, that is not present in the OMOP v5 DDL. Causes COPY command to fail.

CONCEPT.csv

  • 45877004: extra tab within CONCEPT_NAME

Bill

@wstephens

Please re-download the file and retry the load.

These were the issues I found and they should have been corrected over the weekend.

Lee

Thanks @lee_evans. The fix is verified.

I want to thank everyone for doing the vocabulary work!
It is a great resource.

In case you have your .csv files and want to look at them without a database in R, here is a code I was using

The quote=ā€™"" parameter is very useful. The data was not loading all concepts until I added that.

concept<-read.delim('inst/extdata/concept.csv',as.is=T,quote = "")
vocabulary<-read.delim('inst/extdata/vocabulary.csv',as.is=T,quote = "")
crel<-read.delim('inst/extdata/concept_relationship.csv',as.is=T,quote = "")
relationship<-read.delim('inst/extdata/relationship.csv',as.is=T,quote = "")

library(dplyr)
print(filter(vocabulary,VOCABULARY_ID=='None'))



concept %>% group_by(VOCABULARY_ID) %>% summarise(count= n()) %>% arrange(-count)

Loading version 4.5 into an Oracle instance and found two things:

  1. I didnā€™t find a DDL file for V4 in the VocabImport sub-folder that came with the zip file so I used the DDL from http://omop.org/Vocabularies. However, that DDL has the Drug Approval table which seems to not be provided. Is this one no longer be maintained?

  2. The v4 DDL that I downloaded ran into problems during the creation of table constraints on my Oracle 11g instance. I fixed the problem by re-ordering the CREATE statements so that the dependencies in the constraint definitions were satisfied. Here is a link to the DDL that worked: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/ohdsi-related/Standard_Vocab_v4_Table_DDL_Oracle_format.DDL

@rkboyce:

  1. Weird. They are all there: https://github.com/OHDSI/CommonDataModel/tree/master/Version4/Oracle/VocabImport. Drug Approval is currently not updated, thatā€™s correct. We will fix that with the next release (which will have a revamp of NDC and SPLs, also).

  2. Well, if you want, try out the ā€œofficialā€ ones and let us know whether they work.

@Christian_Reich

I believe @rkboyce was referring to the Oracle DDL to create the V4.5 tables. The vocabimport directory contains the Oracle CTL statements to load the vocabulary tables.

The Oracle DDL for the 4.5 Vocabulary tables is in the parent directory here:
https://github.com/OHDSI/CommonDataModel/tree/master/Version4/Oracle
It is called CDM V4 DDL.sql

Note. That DDL file only contains the DDL for the V4.5 Vocabulary tables - not the complete DDL for all V4.5 CDM tables.

currently the http://ohdsi.org/web/athena is not working. Been having this issue for a while

@Sean_C:

Apologies, we are fighting with several problems. Will be back on the grid tomorrow wiht the fix. Please follow the debate here

1 Like

@Christian_Reich - I downloaded the v5 vocabulary. the LOINC concept_id 45877004 and 45885233 have extra tabs within concept name.

Thanks,
Hira

t