OHDSI Home | Forums | Wiki | Github

How to add a custom vocabulary to the OMOP vocabulary table?

We want to create a custom vocabulary for loading a source_concept_id column. This action is straightforward: we create new rows in the concept table with IDs above the 2 billion threshold.

The foreign-key column concept.vocabulary_id is required, so we also create a new row in the vocabulary table.

But now we see that foreign-key column vocabulary.vocabulary_concept_id is also required. The ATHENA vocabularies all have concept rows with Vocabulary = “Vocabulary” and Domain = “Metadata”.

My question: Is it acceptable for us to add a concept to this “Vocabulary” vocabulary, as long as its concept_id is above 2 billion? Or should we create our own “Source Vocabulary” vocabulary to keep our vocab concepts separated from the OHDSI ones?

@clairblacketer – Perhaps some additional guidance in the new OMOP CDM data dictionary is warranted?

Yes, do the above. Which is essentially creating your own source vocabulary. You will need to give the record a unique concept_id > 2billion, a code, and a name. All other attributes will be the same.

In Colorado we have a few different source vocabularies for all our custom concept_ids. As an example, it makes the lookup for Social History data in the Observation table a little easier. Just join the observation_source_concept_id to the concept.concept_id WHERE vocabulary_id = ‘social_hx’.

Thanks @MPhilofsky for confirming that we’re on the right track in creating custom vocabularies for the various *_source_concept_id columns in the OMOP CDM.

However, I was asking a slightly different question. Let’s say I propose to add my own custom concept to the ICD10CM vocabulary. I imagine there would be an outcry from the OHDSI community because I am not the authoring organization for the ICD10CM vocabulary. (Or would there be an outcry if its concept_id were over 2 billion?)

But I am proposing to add my own concepts to the vocabulary called “Vocabulary” (https://athena.ohdsi.org/search-terms/terms?vocabulary=Vocabulary&page=1&pageSize=500). Again, I am not the authoring organization for this vocabulary; the keepers of ATHENA are. But perhaps there is no objection here because this vocabulary is not a healthcare industry standard. It exists only for the OMOP CDM’s vocabulary tables.

For your custom vocabulary called “social_hx”, how did you populate the column vocabulary.vocabulary_concept_id?

Hello, Tim @quinnt

Yes, there still would be an outcry, not mainly because you are not an authoring organization for the ICD10CM vocabulary, but because the OHDSI goal is to maintain vocabulary consistency and uniformity across the globe. ICD10CM should be the same in the USA, China, all over Europe, etc. Thanks to this uniformity, you can easily expand your researches and collaborate with the community.

What you want to do: you want to create your own vocabulary, map it’s terms to Standard concepts (preferred way), or create them as standard concepts (still possible) to create cohorts later. To build your own vocabulary, you need to populate

  • vocabulary: all the information on your custom vocabulary, done similar to other records, according to CDM specifications
  • concept: try to see everything as a concept in CDM as we do in object-oriented programming
    *concept_relationship: put your relationships from your concepts to standard using ‘Maps to’ relationship. Remember, if you can’t find a corresponding concept in Standard vocabulary and create your concept as standard, you still need to create ‘Maps to’ relationship from concept to itself.

2 bil something, the concept_id of concept you would create for your vocabulary ‘social_hx’

To illustrate what I’m talking about, try to check concepts of other vocabularies (ICD10CM is perfect example)

SELECT * FROM vocabulary WHERE vocabulary_id = 'ICD10CM';
SELECT * FROM concept WHERE concept_id = 44819098;

The book of OHDSI, Standardized Vocabularies chapter may be very helpful

I would like to add my own custom vocabulary to the OMOP vocabulary table.

Why would I want to do this? Because the column concept.vocabulary_id in the OMOP concept table is required to be not NULL, so I need one to load concepts from my custom vocabulary.

Okay, let’s add a new row to the OMOP vocabulary table. I give it a vocabulary_id = “social_hx” and a vocabulary_name = “Social History”. Hmm, vocabulary_reference is also required, so I put in a URL to our internal wiki documentation. The column vocabulary_version is not required, so I skip it.

Now I get to the column vocabulary_concept_id. What is this? Seems a bit confusing. Let’s check the OMOP data dictionary: “A Concept that represents the Vocabulary the VOCABULARY concept belongs to.” Still confusing.

Okay, let’s see how the smart people of OHDSI actually use this: SELECT * FROM vocabulary; Okay, I see 96 rows, all of which have a concept ID populated in the vocabulary_concept_id column.

Let’s look at the one for ICD10CM. It’s concept_id = 44819098. Another query: SELECT * FROM concept WHERE concept_id = 44819098;

Aha! Now I see that each row in the OMOP vocabulary table also has a corresponding row in the OMOP concept table. The keepers of ATHENA have defined a “vocabulary catalog” of sorts, where the list of vocabularies is itself a vocabulary. This is the source of the confusion, because a “vocabulary of vocabularies” is hard to wrap your brain around.

Okay, so I want to add my custom vocabulary to the ATHENA vocabulary catalog. Can I simply add my custom vocabulary as a new concept in this “Vocabulary” vocabulary (which is in the “Metadata” domain)?

Or should I create a new catalog for my list of custom vocabularies called “Custom Vocabularies”?

Yes, you can do it and keep all your vocabulary together.

1 Like

A quick side-note as this topic might show up for a lot of people searching how to deal with custom concepts.

In my view, custom concepts should NEVER be marked as standard concepts. And only be used in the _source_concept_id fields. The source concept can still be used (in the local Atlas instance) to create cohorts.

If someone disagrees (which is fine), please reach out as this is an important principle that we need to agree on. We might need to start a separate topic or take this up with the CDM WG.

1 Like

I agree with your stance 1,000% and it matches exactly what we’re doing.

In our OMOP instance, we are loading our EHR’s master files (diagnoses, medications, procedures, lab tests, etc.) and category lists/lookup tables to the CONCEPT table with concept_id values above the 2 billion threshold.

We do this for two reasons:

  1. The ability to construct cohorts using any combination of “standard” and/or “source” concepts
  2. Transparent data lineage from our OMOP instance back to our EHR system

As you said, we NEVER mark these “source” CONCEPT records as “standard” and we ONLY use them within the _source_concept_id columns.

Additionally, we map them to “standard” concepts in the CONCEPT_RELATIONSHIP table with relationship_id = 'Maps to' (and 'Mapped from').

This thread was prompted by my questions about whether we could add a “source vocabulary” to ATHENA’s “Vocabulary Catalog”. What is ATHENA’s Vocabulary Catalog?

It exists in 2 places:

  1. All records in the VOCABULARY table
  2. All records in CONCEPT where domain_id = 'Metadata' and vocabulary_id = 'Vocabulary'.

We created our own “source” vocabularies and added them to ATHENA’s “Vocabulary Catalog”.

We also created our own “source vocabularies” concept class so that we could differentiate between our “source vocabularies” and ATHENA’s “standard” vocabularies within the “Vocabulary Catalog”.

1 Like

Tangential to this topic, is there any standard guidance for using the Concept table for local codes? I created a separate topic.
Move local mappings from Source_to_concept_map to Concepts - CDM Builders - OHDSI Forums

In my view, there is little added value of adding a concept for your custom vocabulary into the concept table. Just adding a record in the ‘vocabulary’ table with vocabulary_concept_id set to 0 works fine.

The process explanation (attached link) is in line with what you posted in your tangential topic post:

Hi Maxim,

Can you provide a brief explanation on how setting vocabulary_concept_id to 0 allows Achilles to characterize these custom concepts?


@Sanjay_Udoshi As far as I know, Achilles does not characterize the vocabulary_concept_id. But maybe I misunderstood the question.

To add a custom vocabulary to the OMOP Vocabulary table, you can follow these general steps:

Prepare your custom vocabulary: Define the concepts, codes, and relationships for your custom vocabulary. Make sure you have the necessary information, such as concept IDs, concept names, domain IDs, relationships, and any additional metadata.

Create a concept table: The concept table is the main table in the OMOP Vocabulary database. It contains information about all the concepts, including your custom ones. You’ll need to insert your custom concepts into this table. Ensure that you have the required columns, such as concept_id, concept_name, domain_id, vocabulary_id, etc.

Define relationships: If your custom vocabulary has relationships with existing concepts in the OMOP Vocabulary, you’ll need to establish those relationships. Determine which existing concepts your custom concepts are related to (e.g., is-a relationship, maps-to relationship), and insert the appropriate rows into the relationship tables (e.g., concept_relationship table).

Load the data: Once you have prepared the concept and relationship information, you need to load the data into the OMOP Vocabulary table. This can be done using SQL statements or by importing data from a file, depending on the method you prefer.

Validate and test: After loading the data, it’s essential to validate the entries and ensure they align with the OMOP Vocabulary specifications. Perform checks for consistency, accuracy, and adherence to the OMOP conventions.

Update the metadata: Update the metadata of your custom vocabulary in the metadata tables, such as the vocabulary table, to provide additional details about your custom vocabulary.

Hope this helps.

This perfectly works for concepts that sufficiently define the events. But people who build their highly customed CDMs mostly out of 2 bil+ concepts would argue with you. They prefer to look into the standard concept_id fields for the sake of convenience.

What about the survey data problem? People prefer to keep the questions and answers as separate concepts and do not pre-coordinate the whole vocabulary creating tons of new instances. Unless we have the value_source_concept_id field introduced, such sort of analytics on the survey data will not work, unfortunately.

I agree there’s not much value unless people submit their local vocabularies (at least the metadata) to the official OMOP vocabularies to get the concept_id assigned, and then leverage this across the data networks.

But how do Achilles and DQD react to the fact there’s no matching concept? Also, we usually explicitly allow this in the documentation if that’s allowed. Do we?

@Agnes_Wojciechowski Please chime in :wink:

1 Like

Since DQD is paramount as standard tool for checkinfg OMOP it would be nice to add the guidance saying when creating custom vocab we are putting 0 as vocabulary_concept_id for it as a default. We can conjure some specific unique scenarios allowing it to be diffenerent, but from my experience it is not really necessairy in most cases. The reason for the explicit guidance is for people to adhere to requirement for not being NULL - OMOP CDM v5.4 - and DQD check to not to indicate the error which in this case is in my opinion really superficial.

This post got me thinking about the utility and necessity of the vocabulary_concept_id.

In my view, the record in the Concept table for a vocabulary does not provide any necessary, additional information about the vocabulary. The record in the Vocabulary table for the vocabulary contains all necessary attributes. I am unaware of any use case for the vocabulary_concept_id. The most common use of the Vocabulary table is identifying the date of the Vocabulary batch being use in the CDM and that query uses vocabulary_name = ‘None’. Another common use of the Vocabulary table is to identify which vocabularies are being used in the CDM.

How does the OHDSI Vocab team use the vocabulary_concept_id field? Is it used by Atlas in some way I don’t know? Do we even need the vocabulary_concept_id field in the Vocabulary table?

@Alexdavv @aostropolets @Christian_Reich @Chris_Knoll @clairblacketer @MaximMoinat

Any time we show you a concept in Atlas, we tell you which vocabulary it came from…which might be useful if you find a concept code that conflicts between codes between vocabularies.

If you are building a drug list, you should usually find it in RxNorm…if you pick something from another vocabulary you might want to check your work.

Likewise, if you are building a condition list, usually that comes from SnoMED, and if you choose from a different vocabulary you may want to check your work.

I think @MPhilofsky means something else, @Chris_Knoll. She is challenging why we need a concept for each vocabulary, while we already have a record in the VOCABULARY table with vocabulary_id as foreign key to it. Only one should suffice, and we never seem to make use of the concepts for anything.

The answer is this, @MPhilofsky: There is a population in our community who thinks of the clinical facts and vocabularies as one continuous information space. Instead of fixed pre-defined queries, you could navigate and detect connections and knowledge. You use technologies like RDF, OWL and Triple Stores to do that efficiently: here, here, here, here and probably in more places.

The problem is that none of these have produced a successful application to a use case, yet, as far as I can tell. We put all those extra concepts in, with a certain burden to maintain them (we also maintain all tables and fields as concepts and relationships, for all versions), but I am not seeing the stream of tools or results.

If you want to propose to get rid of all this, go ahead, but make sure you talk to the sponsors first.


I don’t want to get rid of it. But we should give guidance for folks who map non-OHDSI supported source values/codes to standard concept_ids, aka custom mapping. It sounds like there isn’t an OHDSI network or clinical/medical research use case for the vocabulary_concept_id. So, concept_id = 0 would suffice for custom vocabularies. Correct? Of course the ETL can assign a concept_id > 2 billion for the vocabulary and create a record for it in the concept table if desired.

@MaximMoinat is presenting a proposal on “guidance for ETLers when mapping non-OHDSI supported source values/codes to standard concept_ids, aka custom mapping” to the Themis WG, Thursday October 5th at 9:30 Eastern Time. We encourage those who do custom mapping work to join us. The more eyes on the documentation proposal, the better the final documentation will be.

EDIT: Fixed the time for the meeting.

1 Like