The preferred approach to all OHDSI analyses is to use standard concepts,
such that analyses can be portable across the OHDSI community. Depending
on the analysis, sometimes you need to create a set of standard concepts (a
conceptset) to represent one entity. For example, you want to define a
disease by two or more concepts or by one concept and all of its
descendants. The conceptset function in ATLAS allows you to create a
conceptset expression that serves this purpose. ‘Descendants’ allows you
to specify that when instantiating the conceptset, any concept with this
flag = YES will also include all descendant concept_ids from the
CONCEPT_ANCESTOR table. If any concept in your conceptset expression is
marked ‘Exclude’, then it will be removed from the included conceptset upon
excecution. If a concept is marked ‘Exclude’=TRUE and ‘Descendants’=TRUE,
then the final conceptset will exclude this concept plus all of its
descendants. These conceptsets can then be used in any analysis performed
against any CDM by looking for these concepts in the standard _CONCEPT_ID
fields throughout the CDM. For example, from our Sisyphus Challenge
specification of the target cohort (
http://www.ohdsi.org/web/atlas/#/cohortdefinition/99321), you can see how
conceptsets were defined for diseases like ‘osteoporosis’ and ‘hip
fracture’ using standard concepts from SNOMED and drugs like ‘alendronate’
using standard concepts from RxNorm and then used in the cohort definition
to look for records in the CONDITION_OCCURRENCE and DRUG_EXPOSURE tables,
respectively.
Within the CDM, most domain tables also allow for _SOURCE_CONCEPT_ID field,
which is an optional, non-standard structure to allow for using unique
OHDSI identifiers for each source vocabulary entity. For example, ICD-9-CM
is a non-standard vocabulary, and some organizations with source data
containing ICD-9-CM diagnoses opt to store the ICD_9 source value in the
CONDITION_SOURCE_VALUE field, then the OHDSI vocabulary identifier for that
same ICD9 in the CONDITION_SOURCE_CONCEPT_ID field, and then store the
standard concept that the ICD9 maps to in the CONDITION_CONCEPT_ID field.
While it is not the preferred approach for OHDSI research, we extended the
conceptset expresion in ATLAS to allow for construction of a conceptset
that could be used to query the _SOURCE_CONCEPT_ID fields as well. The
basic idea is that a user may want to have a conceptset that contains
non-standard source concepts. There would be 2 ways to achieve that: 1)
by selecting non-standard source concepts from the vocabulary, thereby
explicitly including them, and 2) by selecting a standard concept and using
the CONCEPT_RELATIONSHIP to pull in all non-standard source concepts that
roll up to the standard. So, the example for #1 would to simply pick an
ICD-9 concept. The example for #2 would be to pick a SNOMED concept for a
disease, mark ‘Mapped’=TRUE and then return back all source codes,
including ICD9, that belong to that SNOMED concept. The value of approach
#2 vs. #1 is that the approach is source vocabulary agnostic, and it can
allow for a more succinct conceptset expression (you have to select fewer
concepts to express the same idea). The value to approach #1 is you can
explicitly define the set of concepts you want without having to know
anything about the vocabulary mappings within and between vocabularies.
The ‘Denscendant’ flag and the ‘Mapped’ flag will have no impact on a
non-standard source concept, because source concepts are not contained in
the CONCEPT_ANCESTOR table and no non-standard source concept map into
another source concept. Instead, source concepts map to standard concepts,
and the ‘Mapped’ flag would only have an impact if you apply it to a
standard concept when trying to include all source concepts that fall
beneath it. So, if you want all components of a piece of the ICD9
pseudo-hierarchy (e.g. all 4-digit and 5-digit codes beneath a 3-digit
code), you’d need to either find the standard concepts that contain these
source codes, or select each source code individually. Once you have
created a conceptset that contains non-standard concepts, you can then
perform your analysis by linking it up to the _SOURCE_CONCEPT_ID fields in
each domain…but just remember that approach won’t work universally
across all CDM, since you’ve now bound it to be source vocabulary dependent.