OHDSI Home | Forums | Wiki | Github

Permanence of domains with no concepts (yet)?

The Book of OHDSI section 4.2.4 General Conventions of Domains includes Table 4.1 enumerating the thirty (30) domains in use in ATHENA (presumably at that time).

However, if you query the “domain catalog” in ATHENA today (December 2020), you would find 52 distinct domains.

A few of these are the “double domains” like Condition/Measurement for concepts used in columns like observation.value_as_concept_id (at least, I hope my understanding of this convention is correct). And I see a few new ones that appear to pertain to CDM v6.x:

  • Condition Status
  • Geography
  • Place of Service
  • Regimen

The remaining domains on the list currently have no constituent concepts.

My question: Are these “empty” domains in danger of going away? Or am I free to use them for my custom source concepts (above the 2 billion concept_id threshold)?

Which ones are empty?

and Place of Service are both in v5.3.1

These domains do not have standard concept_ids. These concept_ids only represent source codes. Don’t give any custom concept_ids a “double domain”.

Your understanding of the double domains is incorrect. The double domains are a hold over from a time long ago. And now only represent a few source codes.
This page provides a little more info on the observation.value_as_concept_id field.

Excluding the “double” domains and the deprecated “Type” domains, the following domains contain no concepts whatsoever. Of the domain concepts in this list, only Modifier has a row in the OMOP domain table.

  • Modifier (concept_id = 12)
  • Care site (concept_id = 57)
  • Person (concept_id = 56)
  • Provider (concept_id = 55)
  • Note (concept_id = 5085)
  • Domain (concept_id = 1)

My original question was this:

If I wanted to create my own custom concepts in, let’s say, the Person domain for use in the column observation.value_as_concept_id, are these domain concepts in the OMOP concept table in any danger of disappearing?

I understand that to do this, I would need to insert my own Person domain record into the OMOP domain table, with domain_concept_id = 56, because one doesn’t already exist.

For reference, here is my SQL query:

SELECT
d.domain_id
,d.domain_name
,COALESCE(d.domain_concept_id, dc.concept_id) AS domain_concept_id
,dc.concept_name AS domain_concept_name
,sum(CASE WHEN c.standard_concept = ‘S’ THEN 1 ELSE 0 END) AS std_concept_cnt
,sum(CASE WHEN c.standard_concept != ‘S’ THEN 1 ELSE 0 END) AS non_std_concept_cnt
,sum(CASE WHEN c.concept_id IS NOT NULL THEN 1 ELSE 0 END) AS concept_cnt
FROM omop_cdm.concept dc
LEFT OUTER JOIN omop_cdm.“domain” d
ON dc.concept_id = d.domain_concept_id
LEFT OUTER JOIN omop_cdm.concept c
ON d.domain_id = c.domain_id
WHERE dc.domain_id = ‘Metadata’ AND dc.vocabulary_id = ‘Domain’
AND (c.concept_id < 2000000000 OR c.concept_id IS NULL)
GROUP BY
d.domain_id
,d.domain_name
,COALESCE(d.domain_concept_id, dc.concept_id)
,dc.concept_name
ORDER BY
d.domain_id;

concept_ids never disappear they only deprecate. Correct, @Dymshyts or @Christian_Reich ? And I would say these are in danger of deprecation, except Domain and maybe Note, since Care Site, Person, and Provider now have more specific domains related to the field in each of their respective tables. And Modifiers are a “class” of CPT4 and HCPCS vocabularies.

I have done a lot of custom mapping for Colorado. From this experience, I would suggest you NOT create custom domains or use those in the list above. Stick with the domains specified for each table.field in the CDM conventions. And keep your ETL clean.

What are you trying to accomplish? What’s the use case? I’m sure the community can come up with a few different approaches to solve the issue.

Our use cases fall into two categories.

Use Case Category 1. Patient-level attributes that aren’t really "observations"

We want to create a few custom vocabularies that would be used in both the OMOP observation table, but also as non-standard extension columns in the OMOP person table for easier filtering in user-access views.

Example 1. Here in New York, we must comply with NYS statute Article 27-F, which imposes stricter protections than HIPAA for patients with or getting tested for HIV. We want to create a custom vocabulary with concepts for these types of patients.

Example 2. Here at Mount Sinai Health System, we have clinics that are subject to the federal 42 CFR Part 2 regulations. We want to create a custom vocabulary with concepts for patients being treated in these clinics.

In both of these examples, our custom concepts are NOT diagnoses of HIV or substance use disorder such as one might find in the OMOP condition_occurrence table. Rather, our custom concepts are being assigned based on an algorithm and are being using to exclude these patients from our user-access views.

It would be nice to include them in the Person domain. But we could use the Observation domain, if necessary.

Use Case Category 2. Extension concept columns

Example 3. In addition to provider specialty, our researchers also want the provider credentials (e.g., MD, DO, PA, NP, APRN, etc.). Our EHR (Epic) has a category list for these credentials that we would like to load as a custom vocabulary for an extension column like provider.credential_source_concept_id.

Are such data elements really required for observational research? No. But we’re using the OMOP CDM as the basis for a data warehouse whose coverage is intentionally broader than OHDSI’s intentions for OMOP.

In Colorado, we added an extension columns for various ‘restrictions’ on our internal, full PHI, OMOP dataset. Person level restrictions, Visit level restrictions, and Department level restrictions are all extension columns on every CDM table, excluding the downloaded Vocabulary tables from Athena. Then we can easily filter on Person, Visit or Department/Clinic. This is a very ‘in your face’ approach done with intention. No need to remember custom concept_ids to remove records. From this ‘everything’ OMOP dataset, we create views. We have the standard OMOP view which is only the CDM tables and fields found in the conventions. This comes in 3 flavors, full PHI, LDS and de-id. And so on.

Would the above work for Example 1 and 2?

@Christian_Reich expanded the Provider Specialty concepts. He should comment on this.

If you are adding an extension column, I do not see the value in creating custom concept_ids for the extension column since data in the extension column won’t ever be used for a network query. You might want to keep it simple and add source values as is.

Also, I just realized an my posting above. I will edit it.

The Provider domain contains 2152 concepts. 770 of those concept_ids are standard.

Select count(*)
from Concept
Where domain_id = “Provider”
– and standard_concept = ‘S’

I never use the Domain table. I didn’t realize there was a domain_id = ‘Modifier’. Learn something new every day :slight_smile:

There are two different Provider domain records in the OMOP concept table. The record with concept_id = 33 has a corresponding record in the OMOP domain table and, interestingly, has domain_name = 'Provider Specialty'. This is the domain that has constituent concepts.

The other Provider domain record has concept_id = 55 and has been updated with invalid_reason = 'D' and valid_end_date = 2019-10-02. This second Provider domain concept record does not have a corresponding record in the OMOP domain table.

I speculate that this second Provider domain, and the other “empty” domains on my list, were created for use in columns domain_concept_id_1 and domain_concept_id_2 in the OMOP fact_relationship table. (See this post: Question about Fact Relationship)

In other words, Provider with concept_id = 55 refers to the OMOP provider table, whereas Provider with concept_id = 33 is Provider Specialty. It seems plausible that someone thought there was a duplicate and logically deleted the one with no constituent concepts.

To my knowledge, nothing of consequence hinges upon how I populate the concept.domain_id column for my custom source vocabularies. Therefore, I’ll just pick one of the existing domain_ids and be done with it.

t