OHDSI Home | Forums | Wiki | Github

Release of ATHENA: New Standardized Vocabularies and Download Page

OHDSI Friends (I am not going to say “OHDSI crew”, because traditionally they haven’t fared so well):

Please take a look at the new Standardized Vocabularies Release dated 3-Apr-2014. The building process has a new name: ATHENA, which stands for “Automated Terminology Harmonization, Extraction and Normalization for Analytics”.

You can now download the vocabulary both in Version 5.0 and Version 4.5 from the new downlaod page at http://ohdsi.org/web/athena. The source code for ATHENA is here.

Version 5.0 of the Standardized Vocabularies was built from the sources and a back-converted Version 4.5 for those of you who are still dwelling in the last CDM version. When downloading form the download page, make sure to select the right Version and the vocabularies you need based on your source data. Many of the vocabularies you cannot unselect, because they are mandatory for the OMOP CDM, and others you will need to contact us to facilitate a license negotiation.

Note: This is a new resource. We might have missed something or made a mistake. Please help improving it by sending us error reports either as a Forum Topic right here, or as a github issue.

We also did a whole bunch of fixes and changes. Here is a list of the most relevant ones:

  • Refactoring of all the Drug vocabularies and relationships between them: RxNorm, NDC, SPL, VA Product and Class, NDF-RT and ATC
  • Addition of PCORNet concepts and mappings
  • Addition of LOINC, HCPCS and CPT4 relatinoships
  • For Observations and Measurement, we now map precoordinated Source Conceps to Standard Concepts (Maps to) and to Value Concepts (Maps to value). For example, ICD9CM codes for personal history are now mapped to the Observation for personal history, and the actual condition goes into the field value_as_concept. Note that this will not work in V4.5.
  • Revised WHO ICD10 and mapping from ICD9CM and ICD10. Note: The US-version ICD10CM is not yet implemented.
  • Revised mapping from Read according to the NHS
  • Revised domain assignment rules for LOINC, Read, ICD9CM, ICD10, SNOMED, HCPCS and CPT4.
  • Additions to Type Concepts
  • Fixed the DRG and MS-DRG classes
  • Fixed the ATC and MedDRA classes
  • Indroduced synonyms to all databases that have them
  • Lots of fixes of small issues you reported (see github issues)

Have fun.

@Christian_Reich and the Odysseus Vocabulary Team

2 Likes

@Christian_Reich

I’ve downloaded a vocab fileset and am attempting to load into the schema from 10/2014 which seems to be the latest version.

Observations:

  • Files are not CSV, but actually tab delimited. Perhaps change to ‘.tab’ extension?
  • The VOCABULARY file appears to contain a “LATEST_UPDATE” column that is not present in the 10/2014 schema, causing loading to fail. The CDM schema appears unchanged leading me to think that there is an extra column in the file.
  • The number of concepts for some terminologies has changes significantly since October. For example, ICD10 (which I used for mapping EU concepts to SNOMED) dropped from 77201 to 12855 concepts. Where did the other 64,000 concepts go?

Thanks,
Bill

@wstephens:

Comments:

  • Yes, we might change the naming. They used to be comma-delimited at some time. But we are providing the loading, so you shouldn’t try to figure this out yourself.
  • Let me look where the latest_update falls apart. Yes, it was added. It is not necessary for the CDM, it’s just a convenience field.
  • The 64k concepts were never ICD10 concepts. They were ICD10CM concepts, the American extension. Well, it was one ugly porridge of one and the other. ICD10 is very clean now. You can rely on the 12855 concepts. BTW: There are deprecated duplicates from a bad previous release among them. so the true number of concepts is 12500ish. As we come closer to October, we will add ICD10CM concepts as a separate vocabulary, since the meaning of many of the concepts that have the same code have slightly different meanings.

Keep it coming.

Hello,

I am encountering few issues while uploading the latest vocab files (Jan 2020). Linking here as this thread fits the purpose

It appears the error is mostly due to having commas in the data. This mixes up the parsing. I don’t know how you are uploading the csv to your database, but you need to somehow indicate a text qualifier (").

The issue is sorted by breaking the file into multiple chunks and saving it as a csv again.

t