OHDSI Home | Forums | Wiki | Github

How to add a custom vocabulary to the OMOP vocabulary table?

(Tim Quinn) #1

We want to create a custom vocabulary for loading a source_concept_id column. This action is straightforward: we create new rows in the concept table with IDs above the 2 billion threshold.

The foreign-key column concept.vocabulary_id is required, so we also create a new row in the vocabulary table.

But now we see that foreign-key column vocabulary.vocabulary_concept_id is also required. The ATHENA vocabularies all have concept rows with Vocabulary = “Vocabulary” and Domain = “Metadata”.

My question: Is it acceptable for us to add a concept to this “Vocabulary” vocabulary, as long as its concept_id is above 2 billion? Or should we create our own “Source Vocabulary” vocabulary to keep our vocab concepts separated from the OHDSI ones?

@clairblacketer – Perhaps some additional guidance in the new OMOP CDM data dictionary is warranted?

(Melanie Philofsky) #2

Yes, do the above. Which is essentially creating your own source vocabulary. You will need to give the record a unique concept_id > 2billion, a code, and a name. All other attributes will be the same.

In Colorado we have a few different source vocabularies for all our custom concept_ids. As an example, it makes the lookup for Social History data in the Observation table a little easier. Just join the observation_source_concept_id to the concept.concept_id WHERE vocabulary_id = ‘social_hx’.

(Tim Quinn) #3

Thanks @MPhilofsky for confirming that we’re on the right track in creating custom vocabularies for the various *_source_concept_id columns in the OMOP CDM.

However, I was asking a slightly different question. Let’s say I propose to add my own custom concept to the ICD10CM vocabulary. I imagine there would be an outcry from the OHDSI community because I am not the authoring organization for the ICD10CM vocabulary. (Or would there be an outcry if its concept_id were over 2 billion?)

But I am proposing to add my own concepts to the vocabulary called “Vocabulary” (https://athena.ohdsi.org/search-terms/terms?vocabulary=Vocabulary&page=1&pageSize=500). Again, I am not the authoring organization for this vocabulary; the keepers of ATHENA are. But perhaps there is no objection here because this vocabulary is not a healthcare industry standard. It exists only for the OMOP CDM’s vocabulary tables.

For your custom vocabulary called “social_hx”, how did you populate the column vocabulary.vocabulary_concept_id?

(Oleg Zhuk) #4

Hello, Tim @quinnt

Yes, there still would be an outcry, not mainly because you are not an authoring organization for the ICD10CM vocabulary, but because the OHDSI goal is to maintain vocabulary consistency and uniformity across the globe. ICD10CM should be the same in the USA, China, all over Europe, etc. Thanks to this uniformity, you can easily expand your researches and collaborate with the community.

What you want to do: you want to create your own vocabulary, map it’s terms to Standard concepts (preferred way), or create them as standard concepts (still possible) to create cohorts later. To build your own vocabulary, you need to populate

  • vocabulary: all the information on your custom vocabulary, done similar to other records, according to CDM specifications
  • concept: try to see everything as a concept in CDM as we do in object-oriented programming
    *concept_relationship: put your relationships from your concepts to standard using ‘Maps to’ relationship. Remember, if you can’t find a corresponding concept in Standard vocabulary and create your concept as standard, you still need to create ‘Maps to’ relationship from concept to itself.

2 bil something, the concept_id of concept you would create for your vocabulary ‘social_hx’

To illustrate what I’m talking about, try to check concepts of other vocabularies (ICD10CM is perfect example)

SELECT * FROM vocabulary WHERE vocabulary_id = 'ICD10CM';
SELECT * FROM concept WHERE concept_id = 44819098;

The book of OHDSI, Standardized Vocabularies chapter may be very helpful

(Tim Quinn) #5

I would like to add my own custom vocabulary to the OMOP vocabulary table.

Why would I want to do this? Because the column concept.vocabulary_id in the OMOP concept table is required to be not NULL, so I need one to load concepts from my custom vocabulary.

Okay, let’s add a new row to the OMOP vocabulary table. I give it a vocabulary_id = “social_hx” and a vocabulary_name = “Social History”. Hmm, vocabulary_reference is also required, so I put in a URL to our internal wiki documentation. The column vocabulary_version is not required, so I skip it.

Now I get to the column vocabulary_concept_id. What is this? Seems a bit confusing. Let’s check the OMOP data dictionary: “A Concept that represents the Vocabulary the VOCABULARY concept belongs to.” Still confusing.

Okay, let’s see how the smart people of OHDSI actually use this: SELECT * FROM vocabulary; Okay, I see 96 rows, all of which have a concept ID populated in the vocabulary_concept_id column.

Let’s look at the one for ICD10CM. It’s concept_id = 44819098. Another query: SELECT * FROM concept WHERE concept_id = 44819098;

Aha! Now I see that each row in the OMOP vocabulary table also has a corresponding row in the OMOP concept table. The keepers of ATHENA have defined a “vocabulary catalog” of sorts, where the list of vocabularies is itself a vocabulary. This is the source of the confusion, because a “vocabulary of vocabularies” is hard to wrap your brain around.

Okay, so I want to add my custom vocabulary to the ATHENA vocabulary catalog. Can I simply add my custom vocabulary as a new concept in this “Vocabulary” vocabulary (which is in the “Metadata” domain)?

Or should I create a new catalog for my list of custom vocabularies called “Custom Vocabularies”?

(Oleg Zhuk) #6

Yes, you can do it and keep all your vocabulary together.