OHDSI Home | Forums | Wiki | Github

Build up concept sets for procedure or measurement

Hi!

I have a few questions regarding the identification of procedure/measurement using Atlas:

  1. What’s the common (good) practice of identifying concept ids for procedure or measurement? In a published literature (https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2760777), researchers included SNOMED, LOINC, and CPT4 (non standard concept) for HbA1c measurement. I’m not sure what’s the thinking process to have multiple vocabularies for the measurement.
  2. Is it okay to have non-standard concept in the concept set? I’m assuming the non-standard concept will be valid only for data that coded that way originally before mapped to standard concept.
  3. I noticed there are ‘exclude’, ‘descendants’, and ‘map’ for each concept id, in which circumstances would we use the ‘map’ function for the concept?

Thanks in advance for your answer. Any response will be greatly appreciated!

Shanshan

Hi @Shanshan4Q33! Welcome to the community! I co-lead the OHDSI Education Workgroup and am delighted to see you asking questions about how to build concept sets. (When you have a chance introduce yourself to the community on our longstanding welcome thread: Welcome to OHDSI! - Please introduce yourself)

Now, your topic at hand.

Let’s start from the basics. If you’ve never found your way to EHDEN Academy, that’s a great place to start with on-demand modules to train you up on the common data model, vocabularies and the various tools and methodologies we have as a community. It’s free to use. I’d highly recommend these courses as they’ll provide much more detail than I’m about it give.

It all starts with searching from the clinical attribute you’re capturing in the data. For example, if I’m running a study on total joint arthroplasty and want to build a concept set of those procedures, I’d likely be starting with either: a) a list of strings (“knee arthroplasty”, “hip arthroplasty”, “shoulder arthroplasty”) or b) a list of clinical codes (e.g. CPT4 codes). One way to do this is to se ATHENA to look up how these terms exist in the vocabulary tables. ATHENA is a vocabulary viewer – and its sister is the Search tab in ATLAS, which can give you more information about the number of records that use that term. You can learn more about ATHENA in this 10-Minute Tutorial by @mik : https://youtu.be/2WdwBASZYLk

One fundamental principle here is that our vocabulary tables aren’t “choosing” anything. They’re a compiled standard leveraging ontological decision rules from a variety of sources. As end users, we’re looking up the term in the tables and the vocabulary CONCEPT table will tell you more about whether that concept is: standard versus non-standard and what domain it would be stored in the common data model structure.

Simplest answer: there are multiple ontologies that are capable of representing measurement attributes. The LOINC vocabulary is one and SNOMED is another. The vocabularies are a best-of-breed approach at compiling all the ways that information can be stored – inclusive of attributes like key-pairs. Measurements are particularly interesting because they have multiple components of standardization: 1) the measurement as it is collected, 2) the value set of the measurement (e.g. a qualitative or quantitative value) and 3) the unit of measurement (as applicable).

It’ll depend on what your use case is. If you intend on running a network study (aka an analysis across more than one OMOP CDM from different source systems), you need to lean to the standards. But yes, you’re right in assuming that for a non-standard to exist in a data set it has to have been coded that way in the source system and then be retained in the non-standard concept column for that domain.

Exclude = does what it says – will create a negation for that concept ID
Descendants = pulls all the children of that parent concept from that level of detail and below
Map = a mysterious feature that few understand or dare to use. :wink: @Chris_Knoll or @anthonysena might have some sage wisdom. Generally, people don’t use this feature because it veers into the world of “off label” OMOP vocabulary use.

Hope that helps you get started in your journey!

Have you joined the OHDSI MS Teams? There’s tips here on how to make the most of your time in the community: Join the Journey – OHDSI

Best,
Kristin

It’s a fair point, the Mapped option is not used much because we encourage people to think in terms of standard concepts, and the ‘mapped’ option lets you use a STANDARD concept but yield the NON-standard concept that maps to the concept provided in the concept set expression.

Specifically, it works like this:

select distinct cr.concept_id_1 as concept_id
FROM
(
  @conceptsetQuery
) C
join @vocabulary_database_schema.concept_relationship cr on C.concept_id = cr.concept_id_2 and cr.relationship_id = 'Maps to' and cr.invalid_reason IS NULL

The @conceptsetQuery is the query that grabs all the concepts that user specified, and then returns the concepts that has a Maps to relationship to the included concept. Since non-standard concepts are never mapped to, you’d only use this on Standard Concepts in your concept set expression, but it will return non-standard concepts.

I never use this feature because I want to only use standard concepts when describing my concept sets. I haven’t thought about this feature in many years (5+) but when I first started trying to answer your question, I imagined the way it worked was you put in a NON-standard concept and it would include the standard concepts into your concept set (effectively allowing you to think in terms of non-standard concepts but yield standard concepts in your concept set expression result). This isn’t how it works (see above), but I think this mapped feature would be more useful if it let you state a source concept like ICD10 for a T2DM code, and it would return the standard concepts that map to it.

I think the reason why I decided on the approach described above (return non-standard from standard concept) is that 1) the other flags (descendant and mapped) work on a standard-concept basis (only standard concepts have descendants, and only standard concepts get mapped-to). By always thinking in standard concepts, if you want to use a source-concept column, you’d use the ‘mapped’ options to find the source concepts and use that type of concept set in the source_concept column concept sets.

1 Like

Hi Kristin,

Thanks for your quick response and warm welcome! The answers you provided have been extremely helpful.

I like the idea of standardizing the concept sets and the protocol to do the analysis using data from multiple sites. One follow-up question I have is that, for different analysis, ppl might create concept sets differently for the same target, e.g. HbA1c measurement, and we are not sure which one is ‘the best’. I looked up the create concept sets for HbA1c from Atlas and the published literature, they are different and I think these discrepancies arise from the manual review after typing keywords and clinical codes of HbA1c in Athena to determine which concepts IDs should be included.

While I feel confident in identifying concept IDs for health conditions and medications, I find it challenging to do so for measurements. But based on your response, I might want to keep only SNOMED and LOINC for measurement when doing network analysis and would like to also include non-standard vocabulary like CPT4 to capture the measurement as much as possible if I am using a dataset from a single system or if the measurements are coded using the same source code for HbA1c across systems.

Thank you once again! I am delighted to have connected with you and the ODHSI community.

Thank you, Chris, for your thoughtful response.

I would like to get your clarification on one idea: If I click the ‘mapped’ option (for the use of analyzing data for a single system and my goal is to capture the information as much as possible), would this be a better approach than including only standard concept IDs in the analysis?

I’m curious about the inclusion of non-standard concept IDs in some of the concept sets created from Atlas and mentioned in the published literature, and I’m unsure about the role these non-standard concept IDs play. If I were to include a specific set of non-standard concepts, I’m wondering if it involves (1) using the non-standard concepts themselves to determine the measurement or (2) using the non-standard concepts to identify the corresponding standard concepts and then determining the measurement solely based on the standard concepts. I’m interested to know which of these approaches is used.

I think this question may not be relevant if our goal is to analyze data form multiple systems where we wold like to build up the same concept sets to ensure the comparability of the results.

Appreciate your time and help!

Very fair point, @Shanshan4Q33 ! This is where the beauty of @aostropolets’ PHOEBE may be of use to your analysis. To learn more about PHOEBE, there’s two videos: 10 Minute Tutorial on PHOEBE and Introducing PHOEBE 2.0 that may help you think about how to navigate this space!

1 Like

Awesome! Thanks for sharing @krfeeney :slight_smile:

shanshan

Thanks so much @krfeeney!

@Shanshan4Q33 funny enough we just created a concept set for HA1C for demo. Can be found here. This question may pop up for other concept sets - Kristin’s links are good source of guidance.

Hi Anna - Thanks for sharing! I noticed that you included a ‘LOINC group’ vocabulary, very interesting.

If you mean for a single system, you mean to use source-specific codes because there may be more information that hasn’t been mapped to standard concepts…I would say…maybe, but I don’t think so: if the mappings aren’t specfiic enough, then using the Mapped options isn’t going to save you anything…it’s still depending on the standard-to-non-standard maps. Instead, just build your concept-set with verbatim source concepts if you want to work with source concepts. Perhaps this is another argument to reverse the standard-to-source behavior of the Mapped option so that it returns standard concepts from source concepts…since that would help you pick a standard concept from source/non-standard concepts and the resolved concept set will contain standard concepts (vs. the resolved concept set would include non-standard/mapped concepts that we currently get).

Simple answer is that you’d put non-standard concepts (or mapped standards that return non-standard) into a concept set when you want to use the {domain}_source_concept columns in your queries. Only standard concepts can go in {domain}_concept_id, while non-standard are allowed in {domain}_source_concept_id.

1 Like

LOINC groups are roughly speaking classification terms that have a bunch of related codes below them. So I used that broad term + all descendants. Also you may notice that SNOMED 4184637 has a LOINC 3034639 kid - that’s because we’ve done some work with connecting SNOMED and LOINC into one joint hierarchy (though we’re desperately looking for collaborators to expand this work).

1 Like

This makes perfect sense to me.

Gotcha. This is how we identified target of interest using EMR data before.

I explored the dataset a little bit yesterday and now I understand what you mean by this. Thank you so much Chris.

Hi Anna - thanks for bringing this up! : )

A follow-up question I have is that, are you going identify the HbA1c measurement with your current concept set including both SNOMED and LOINC vocabularies in the measurement table? Or, using both the measurement and the procedure_occurrence table (because you include LOINC vocabularies)? One problem I came across is that there is no record for plasma fasting/random glucose testing (defined by only SNOMED concept ids in my concept set) when I use the measurement table.

The concept set mainly includes measurements but there is one HCPCS procedure and a couple of CPT4 observations. Maybe those are in your data. Generally, records should be found in the tables that correspond to concepts’ domains. In this case, majority of records should be in measurement table except for those above. That’s of course if the ETL was done by conventions.

1 Like

Thank you Anna!

Another follow-up question I have tho, is that when I try to find out the LOINC group concept, If I type in ‘HbA1c’, it didn’t show up. I will have to enter ‘Hemoglobin A1c’ to get the LOINC group concept. Is it possible to improve the Athena search engine to recognize both ‘HbA1c’ and ‘hemoglobin A1c’ as the same concept? Or, am I missing or misunderstanding something in this case?

Athena does show you Hemoglobin A1C measurement it’s just not the first one. Why? Because it cannot know that you want that concept first since there are concepts that have better match with Hb1AC and those are displayed first (like HBA1c target).

If you use Phoebe on atlas-demo.ohdsi.org (green shopping card in search tab) you will get Hemoglobin A1C measurement when you input hba1c since it prioritizes concepts with high counts in the network (and Athena can’t do that).

1 Like

Thank you Anna!

t