Using the 'Mapped' function of concept set expressions

Chris_Knoll · August 26, 2025, 3:38pm

Hello, community!

As we’re moving into Atlas 3.0, we’re looking at potential changes that might cause disruption in existing functionality, and this thread is going to focus on one aspect of concept set expressions: mapped concepts.

Note: this is something that we may consider as a hotfix to the 2.x line of Atlas/WebAPI depending on community feedback.

Background

When defining a concept set, you specify a set of ‘items’ that contain the following elements:
concept_id: the CDM concept ID representing the conceptID to use
isExclude: if this concept is serving as an exclusion to the final concept set expression
isDescendant: if this concept should pull in descendants from concept_ancestor
isMapped: if this concept should pull in mapped concepts found by using the concept_relationship’s ‘Maps To’ relationship.

The way the final list of concepts (which we call ‘resolved concepts’) is we take all the concepts that are not excluded, plus the ancestors and mapped concepts for those items that say to include them, and then finally remove any concepts that are marked ‘exclude’ (and we will include mapped/descendant concepts if the exclusion item says to do so).

Now that you know everything there is to know about how concept sets work (haha, joking), I’d like to focus on the ‘Mapped’ option of a concept set item:

The use case for using the ‘mapped’ option is to pull in ‘source concepts’ into your concept set expression. This supports use cases where the researcher wants to focus on one particular coding system (such as ICD9 or ICD10) for their code lists.

When you have a concept set item that says ‘use Mapped’, the resolved concepts will include both the standard concept (that you want the mapped concepts for) and the mapped concepts themselves. We’ve received feedback that it is not intuitive that when you asked for mapped concepts you are also getting the standard concepts, when the user in this case only wanted to see the mapped concepts in their resolved concept set.

Proposed Change

The proposed change is that we change the behavior of the ‘mapped’ option to only include the mapped source concepts in the resolved concepts. If we make this change, it’s still possible to have a concept set expression that includes standard concepts + descendants and mapped concepts, you would just specify one item says ‘include mapped’ and the other would say ‘include descenants’, and since you asked for both, the resolved concepts will have both. To me this is clearer as it’s more explicit in what the user is requesting, and leads to less confusion about why you are seeing the standard concepts in the result when you only expected mapped concepts.

The question to the community is: Would this change impact your work? If you only cared about the source concepts in the resolved concepts, then this change won’t impact you. If you were using one concept set with ‘mapped’ and you were looking at the ‘source_concept_id’ columns and the standard concept columns (such as CONDITION_CONCEPT_ID and CONDITION_SOURCE_CONCEPT_ID) using a concept set with ‘mapped’ options that mixed standard and non-standard concepts, then this might impact you.

Please provide your thoughts.

Thank you!

Gowtham_Rao · August 30, 2025, 11:49am

Thanks for raising this, @Chris_Knoll . I’ve seen cohort definitions that use concept set expressions which resolve to both standard and non-standard concept IDs — either by explicitly selecting non-standard concepts or by including ‘mapped’ concepts. This is then followed by using that same expression to query both condition_concept_id and, separately, condition_source_concept_id .

In my opinion, this is a common practice that would be impacted by the proposed change.

Christian_Reich · August 30, 2025, 8:02pm

What I never understood (but could have asked), and the UI doesn’t reveal it, is this: are the “mapped” source concepts used in executing a cohort definition (by going to the _source_concept_id) or are there just there for helping tune the standard conceptset? I don’t think they are. But if nothing else, the UI could definitely make that more clear.

Whether or not we should allow source concepts as conceptsets in the concept definition (which would be executed against the source_concept_id) is a good question. If we do, we can support all use cases where folks want to import some spaghetti code list from the literature. But we would be introducing all sorts of contradictions (the ETLer may have used different mappings than the Standardized Vocabularies), and we would be eroding the idea of a Common Data Model that works internationally. Americans may be used to the idiosyncrasies of ICD-10-CM (with all the hilarious codes in it) and happily work around them, but for the rest of the world it would be pretty useless.

I understand we suffer from chronic subtle semantic discrepancies when using mapped codes, but I haven’t seen many use cases where this really matters. But I can be convinced otherwise. Let’s bring them on.

Gowtham_Rao · August 31, 2025, 12:06am

The choice of querying the _source_concept_id field in an omop clinical event table, is a property of the cohort definition, not concept set definition. ie, just because mapped was selected in the concept set expression, it won’t modify the underlying SQL of the cohort definition.

Gowtham_Rao · August 31, 2025, 12:11am

We allow it now.

What if I select one standard concept, no descendants, include mapped? Current behavior is that I will get the one standard concept id in the resolved set. Is the proposal to not get that single standard concept id because exclude was not selected?

Christian_Reich · August 31, 2025, 1:03pm

That’s what I thought. But it is not clear from the UI. it should clearly state “this is for inspection, not for defining a cohort”.

Gowtham_Rao · August 31, 2025, 1:08pm

actually you can use it for cohort definition. e.g.

with concept set expressions like

or like

Chris_Knoll · September 2, 2025, 12:45am

I’ll do my best to reply all above:

Ok, I’ll put you down as ‘will be an impact’. Fortunately, the major version’s of software allows for this change, but I’d like to hear how wide impact this will be.

The UI reveals it under ‘included concepts’: the included concepts are all the concepts you will get from your include/exclude/mapped choices you made in your concept set expression. What it doesn’t show is exactly which included concepts came from which concept set expression item (that would be hard to represent) but anything you see in your included concepts can be removed by making it an exclusion.

The proposed change here is to make the the ‘mapped’ option include only the source codes that are mapped to the selected concept, and no longer include the selected concept itself in the included concepts. This is because we’re hearing reports from people who just want the mapped concepts in their included concepts, and when they say ‘give me the concepts mapped to standard concept A’, they are getting ‘standard concept A’ itself included in the concept set, which is not what they were expecting. If they wanted both the standard concept and mapped concepts, they could specify one concept set item to include the concept, and another concept set item to include the mapped.

The work-around for today is to include the concept with mapped (which brings in source concepts and the selected concept) and add an item to exclude the standard concept directly.

That is a good question, but I’d appreciate it if we could focus on the open question at hand which is just a question about ‘intuitiveness’ of the tool: if you say ‘give me mapped concepts of X’ should you get X in that answer (assuming X doesn’t map to itself). Current behavior can be confusing, and as we move into Atlas 3.0 we can change these things in potentially backwards-compatibility breaking ways.

Yes, that’s the current behavior, that you get more than just the mapped concepts of the selected concept, you get the selected concept itself in the resulting concept set. This proposal is to make it so when you ask for the mapped, you just get the mapped. So, to answer the question: yes the proposal is not to get that single standard concept id, but not because exclude wasn’t selected, it is because you’re asking for mapped concepts. You use exclude to remove the concepts from the result.

Here’s a use case of exclude and mapped: You have concept X where you want mapped. It brings in 3 source concepts S1, S2, S3. You don’t want S2 or S3 in your result, so you will add S2 and S3 as exclusions. Now you have just a concept set expression that yields S1.

The more concrete reason for this is that sometimes you have standard concepts that maps to those pesky .8 and .9’s of ICD9 that mean ‘something else’, and while people might interpret that to mean ‘some kind of this condition’, what I’ve found is that it actually means ‘something that is not one of the other .1 through .5 forms of the disease’, leading people to want to get rid of the .8s and .9s because it’s not what they want.

Yes, so in the above example, the first image is saying bring me Essential Hypertension (with descendants) mapped concepts and Essential [primary] hypertension source code, while the latter just says give me the Essential Hypertension (with descendants) mapped concepts. The issue is that it’s going to bring in more than just the mapped concepts, it’s going to bring in standard concepts too, which could be confusing (because the user has to know we are inferring bringing in the standard concepts along side the mapped concepts).

Hopefully these answers clarify the specific issue we’re trying to address with the proposed change: potential confusion coming from using the ‘mapped’ option in concept set expressions.

Gowtham_Rao · September 2, 2025, 1:03am

That’s seems like an edge use case requesting a breaking change? Especially when there is a solution.

This doesn’t sound like a workaround. I would say it’s an intuitive and expected behavior.

Maybe I am missing something. I am looking forward to understand the use case more.

Gowtham_Rao · September 2, 2025, 1:04am

Yes. I think standard should be included by default.

I don’t think you are correct here, maybe you are. If the user is seeing an explicit selection of a code, why would it being included not be intuitive?

Gowtham_Rao · September 2, 2025, 1:08am

What if I want both selected and mapped? With this change my workaround would be add a duplicate and not select mapped.

Chris_Knoll · September 2, 2025, 1:16am

If you want both the standard concept and the mapped, you’d add the concept set item that says ‘include the concept’ and another concept set item that says ‘include the mapped’.

As you can see, this proposal makes it more explicit about what you’re asking for and what you are getting in your result. The challenge today is people indicating they want a concept set of source concepts in it, but they are getting a concept set with source concepts (which they expect) and additional standard concepts (which they are not expecting). The challenge I was posed with was: ‘if X isn’t mapped to itself, then why am I seeing X in my result when I’m only asking for things mapped to X?’.

The answer was: we could make it so that the mapped option only brings in mapped concepts, but would want to check with the community first

Gowtham_Rao · September 2, 2025, 1:32am

I think the answer is, you get everything you see in the concept set expression unless you explicitly exclude it.

Vojtech_Huser · September 2, 2025, 6:25pm

It depends on how the definition is stored and transitioned between versions. If in JSON file (or other structure housing my definition (of conceptset, of cohort, of X)) we have done due diligence, then I can live with anything.

If change is made on day D (e.g., Oct 1 2025), if all definitions prior D are handled properly (user did what the UI was doing; mapped+descendant) and definitions after D have proper info (user clicked mapped or user clicked mapped+descendant) - then the change was done in a clean way.

What was written in other thread - limitation of current UI that user can use only descendants (and one relationship of mapped to) - that is a separate limitations and while we make changes - it is worth considering.

Or at least allow it in conceptset definitions and phenotype definitions done by coding (not via GUI ([graphical] user interface) because we may have to keep the GUI relatively simple for a typical user).