OHDSI Home | Forums | Wiki | Github

How to add a custom vocabulary to the OMOP vocabulary table?

We want to create a custom vocabulary for loading a source_concept_id column. This action is straightforward: we create new rows in the concept table with IDs above the 2 billion threshold.

The foreign-key column concept.vocabulary_id is required, so we also create a new row in the vocabulary table.

But now we see that foreign-key column vocabulary.vocabulary_concept_id is also required. The ATHENA vocabularies all have concept rows with Vocabulary = “Vocabulary” and Domain = “Metadata”.

My question: Is it acceptable for us to add a concept to this “Vocabulary” vocabulary, as long as its concept_id is above 2 billion? Or should we create our own “Source Vocabulary” vocabulary to keep our vocab concepts separated from the OHDSI ones?

@clairblacketer – Perhaps some additional guidance in the new OMOP CDM data dictionary is warranted?

Yes, do the above. Which is essentially creating your own source vocabulary. You will need to give the record a unique concept_id > 2billion, a code, and a name. All other attributes will be the same.

In Colorado we have a few different source vocabularies for all our custom concept_ids. As an example, it makes the lookup for Social History data in the Observation table a little easier. Just join the observation_source_concept_id to the concept.concept_id WHERE vocabulary_id = ‘social_hx’.

Thanks @MPhilofsky for confirming that we’re on the right track in creating custom vocabularies for the various *_source_concept_id columns in the OMOP CDM.

However, I was asking a slightly different question. Let’s say I propose to add my own custom concept to the ICD10CM vocabulary. I imagine there would be an outcry from the OHDSI community because I am not the authoring organization for the ICD10CM vocabulary. (Or would there be an outcry if its concept_id were over 2 billion?)

But I am proposing to add my own concepts to the vocabulary called “Vocabulary” (https://athena.ohdsi.org/search-terms/terms?vocabulary=Vocabulary&page=1&pageSize=500). Again, I am not the authoring organization for this vocabulary; the keepers of ATHENA are. But perhaps there is no objection here because this vocabulary is not a healthcare industry standard. It exists only for the OMOP CDM’s vocabulary tables.

For your custom vocabulary called “social_hx”, how did you populate the column vocabulary.vocabulary_concept_id?

Hello, Tim @quinnt

Yes, there still would be an outcry, not mainly because you are not an authoring organization for the ICD10CM vocabulary, but because the OHDSI goal is to maintain vocabulary consistency and uniformity across the globe. ICD10CM should be the same in the USA, China, all over Europe, etc. Thanks to this uniformity, you can easily expand your researches and collaborate with the community.

What you want to do: you want to create your own vocabulary, map it’s terms to Standard concepts (preferred way), or create them as standard concepts (still possible) to create cohorts later. To build your own vocabulary, you need to populate

  • vocabulary: all the information on your custom vocabulary, done similar to other records, according to CDM specifications
  • concept: try to see everything as a concept in CDM as we do in object-oriented programming
    *concept_relationship: put your relationships from your concepts to standard using ‘Maps to’ relationship. Remember, if you can’t find a corresponding concept in Standard vocabulary and create your concept as standard, you still need to create ‘Maps to’ relationship from concept to itself.

2 bil something, the concept_id of concept you would create for your vocabulary ‘social_hx’

To illustrate what I’m talking about, try to check concepts of other vocabularies (ICD10CM is perfect example)

SELECT * FROM vocabulary WHERE vocabulary_id = 'ICD10CM';
SELECT * FROM concept WHERE concept_id = 44819098;

The book of OHDSI, Standardized Vocabularies chapter may be very helpful

I would like to add my own custom vocabulary to the OMOP vocabulary table.

Why would I want to do this? Because the column concept.vocabulary_id in the OMOP concept table is required to be not NULL, so I need one to load concepts from my custom vocabulary.

Okay, let’s add a new row to the OMOP vocabulary table. I give it a vocabulary_id = “social_hx” and a vocabulary_name = “Social History”. Hmm, vocabulary_reference is also required, so I put in a URL to our internal wiki documentation. The column vocabulary_version is not required, so I skip it.

Now I get to the column vocabulary_concept_id. What is this? Seems a bit confusing. Let’s check the OMOP data dictionary: “A Concept that represents the Vocabulary the VOCABULARY concept belongs to.” Still confusing.

Okay, let’s see how the smart people of OHDSI actually use this: SELECT * FROM vocabulary; Okay, I see 96 rows, all of which have a concept ID populated in the vocabulary_concept_id column.

Let’s look at the one for ICD10CM. It’s concept_id = 44819098. Another query: SELECT * FROM concept WHERE concept_id = 44819098;

Aha! Now I see that each row in the OMOP vocabulary table also has a corresponding row in the OMOP concept table. The keepers of ATHENA have defined a “vocabulary catalog” of sorts, where the list of vocabularies is itself a vocabulary. This is the source of the confusion, because a “vocabulary of vocabularies” is hard to wrap your brain around.

Okay, so I want to add my custom vocabulary to the ATHENA vocabulary catalog. Can I simply add my custom vocabulary as a new concept in this “Vocabulary” vocabulary (which is in the “Metadata” domain)?

Or should I create a new catalog for my list of custom vocabularies called “Custom Vocabularies”?

Yes, you can do it and keep all your vocabulary together.

1 Like

A quick side-note as this topic might show up for a lot of people searching how to deal with custom concepts.

In my view, custom concepts should NEVER be marked as standard concepts. And only be used in the _source_concept_id fields. The source concept can still be used (in the local Atlas instance) to create cohorts.

If someone disagrees (which is fine), please reach out as this is an important principle that we need to agree on. We might need to start a separate topic or take this up with the CDM WG.

1 Like

I agree with your stance 1,000% and it matches exactly what we’re doing.

In our OMOP instance, we are loading our EHR’s master files (diagnoses, medications, procedures, lab tests, etc.) and category lists/lookup tables to the CONCEPT table with concept_id values above the 2 billion threshold.

We do this for two reasons:

  1. The ability to construct cohorts using any combination of “standard” and/or “source” concepts
  2. Transparent data lineage from our OMOP instance back to our EHR system

As you said, we NEVER mark these “source” CONCEPT records as “standard” and we ONLY use them within the _source_concept_id columns.

Additionally, we map them to “standard” concepts in the CONCEPT_RELATIONSHIP table with relationship_id = 'Maps to' (and 'Mapped from').

This thread was prompted by my questions about whether we could add a “source vocabulary” to ATHENA’s “Vocabulary Catalog”. What is ATHENA’s Vocabulary Catalog?

It exists in 2 places:

  1. All records in the VOCABULARY table
  2. All records in CONCEPT where domain_id = 'Metadata' and vocabulary_id = 'Vocabulary'.

We created our own “source” vocabularies and added them to ATHENA’s “Vocabulary Catalog”.

We also created our own “source vocabularies” concept class so that we could differentiate between our “source vocabularies” and ATHENA’s “standard” vocabularies within the “Vocabulary Catalog”.

1 Like

Tangential to this topic, is there any standard guidance for using the Concept table for local codes? I created a separate topic.
Move local mappings from Source_to_concept_map to Concepts - CDM Builders - OHDSI Forums

In my view, there is little added value of adding a concept for your custom vocabulary into the concept table. Just adding a record in the ‘vocabulary’ table with vocabulary_concept_id set to 0 works fine.

The process explanation (attached link) is in line with what you posted in your tangential topic post:

Hi Maxim,

Can you provide a brief explanation on how setting vocabulary_concept_id to 0 allows Achilles to characterize these custom concepts?


@Sanjay_Udoshi As far as I know, Achilles does not characterize the vocabulary_concept_id. But maybe I misunderstood the question.

To add a custom vocabulary to the OMOP Vocabulary table, you can follow these general steps:

Prepare your custom vocabulary: Define the concepts, codes, and relationships for your custom vocabulary. Make sure you have the necessary information, such as concept IDs, concept names, domain IDs, relationships, and any additional metadata.

Create a concept table: The concept table is the main table in the OMOP Vocabulary database. It contains information about all the concepts, including your custom ones. You’ll need to insert your custom concepts into this table. Ensure that you have the required columns, such as concept_id, concept_name, domain_id, vocabulary_id, etc.

Define relationships: If your custom vocabulary has relationships with existing concepts in the OMOP Vocabulary, you’ll need to establish those relationships. Determine which existing concepts your custom concepts are related to (e.g., is-a relationship, maps-to relationship), and insert the appropriate rows into the relationship tables (e.g., concept_relationship table).

Load the data: Once you have prepared the concept and relationship information, you need to load the data into the OMOP Vocabulary table. This can be done using SQL statements or by importing data from a file, depending on the method you prefer.

Validate and test: After loading the data, it’s essential to validate the entries and ensure they align with the OMOP Vocabulary specifications. Perform checks for consistency, accuracy, and adherence to the OMOP conventions.

Update the metadata: Update the metadata of your custom vocabulary in the metadata tables, such as the vocabulary table, to provide additional details about your custom vocabulary.

Hope this helps.