On studying Atlas - it seems like Concept-sets are stored in the application and not pushed back into the OMOP Common Data Model. This makes use of Concept-set in processes outside the Atlas process difficult.
An idea would be to add two new tables to the OMOP common data model
Concept_sets: table that stores the concept sets in a database for persistence, instead of just at the application level.
Concept_sets_definition: table that describes the concept_sets.
Use case: reusing concept-sets for SQL joins while building cohorts outside Atlas; persistence in CDM makes reuse easy and independent of Atlas.
@Frank, we talked about this at a recent architecture WG meeting, didn’t we? If @Gowtham_Rao can make it to the call today (1pm), should we bring it up again?
@Gowtham_Rao, are you thinking about sharing concept sets beyond a single institution, or just between CDMs at one institution?
I think it should be shared - we are formulating organizational policy right now.
@Frank@Sigfried_Gold Sorry had to drop off the architecture call, but what I think I heard: Instead of persisting full concept_sets, an alternative would be persist concept_set_expression’s with flags for exclude, descendants and mapped; and use that to dynamically create concept_sets. I like that idea, but dont think that solves the primary problem of data persistence.
We persist cohort, cohort_attribute, cohort_definition, attribute_definition in the OMOP common data model; but we dont persist concept_definition in OMOP common data model. Could we explore persisting the concept_definitions in the OMOP common data model too like cohort and cohort_attribute?
@Gowtham_Rao persisted assets managed by the application are stored in a poorly documented data model that is less rigid than what we know as the common data model. This data model consists of tables concept_set and concept_set_item (see the WebAPI migration script). If these do not satisfy your current use cases, I might propose that changes be made to those tables. I recall recent discussions around providing richer metadata especially related to provenance (e.g. timestamps, reference URLs, versioning). You may wish to review recent OHDSI Architecture meeting minutes here.
I think you may be highlighting a need for better documentation of the data model behind WebAPI and more principled collaboration on it. What do others think?
Actually, we don’t, at least from the WebAPI perspective. While there is a cohort definition and a cohort table in the OMOP CDM, the WebAPI stack writes results to a special schema (known as the results schema) that has a chort table that closely resembles the OMOP table, but it’s separate. This is due to that some of the sites we work with lock down their CDM schemas to be read-only, preventing the possibility of storing results there. So, we went with the decision to create a special ‘results schema’ that would have write permissions and we instantiate the cohort tables (among other tables used for other analyses).
I attempted to dig up the very old forum posts that talks about the creation of the cohort table for CIRCE, but this is probably something that should be documented (the difference between the CDM tables and result schema tables). It’s out there if you search for it if you are interested in the background. But it is scattered around different topics.
This isn’t to say that the CDM’s cohort table isn’t used for analytics and other important scientific questions. I am saying that concept sets (and their persistance) is a construct that you find in the context of WebAPI and WebAPI generates results into a ‘results schema’ that is separate (but associated with) a CDM schema.
One thing I can think of that you may like is that you might want to generate a concept set into a results schema of a cdm. So we have a ‘cohort’ table in the results schema (and I’m very much inclined to rename this table to ‘cohort_results’ to eliminate any further confusion between the CDM schema and the results schema), we could define another results table called ‘conceptset_results’ that can maintain the generated result of a concept set into a cdm (keyed by some concept set ID). Then in other analyses you do, you can refer to the included concept sets in a concept set expression by an ID, instead of having to materialize it over and over.
Do you have template for the SQL query that will conver the concept_set_expression in the table below to list of concepts? This is probably in WebApi or Atlas.
CREATE TABLE [${ohdsiSchema}].[concept_set_item] (
[concept_set_item_id] INT IDENTITY (1, 1) NOT NULL,
[concept_set_id] INT NOT NULL,
[concept_id] INT NOT NULL,
[is_excluded] INT NOT NULL,
[include_descendants] INT NOT NULL,
[include_mapped] INT NOT NULL
);
you need to replace @cdm_dataase_schema and @webAPI schema with your cdm/webapi schema.
you need to replace @conceptset_id with the id of your concept set.
This asumes that your webapi schema is locally joinable to your cdm
In the subqueries below: I is the included concpts, E is the excluded concepts, and the final LEFT JOIN looking for E.concept_id is null means that we remove all concept_ids found in E from the concepts found in I.
SELECT c.concept_id
FROM (
select distinct I.concept_id FROM
(
select concept_id
from @cdm_database_schema.CONCEPT
where concept_id in (
select concept_id
from @webapi_schema.concept_set_item
where concept_set_id = @conceptset_id and is_excluded = 0
) and invalid_reason is null
UNION
select c.concept_id
from @cdm_database_schema.CONCEPT c
join @cdm_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (
select concept_id
from @webapi_schema.concept_set_item
where concept_set_id = @conceptset_id and is_excluded = 0 and include_descendants = 1
) and invalid_reason is null
) I
LEFT JOIN (
select concept_id
from @cdm_database_schema.CONCEPT
where concept_id in (
select concept_id
from @webapi_schema.concept_set_item
where concept_set_id = @conceptset_id and is_excluded = 1
) and invalid_reason is null
UNION
select c.concept_id
from @cdm_database_schema.CONCEPT c
join @cdm_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (
select concept_id
from @webapi_schema.concept_set_item
where concept_set_id = @conceptset_id and is_excluded = 1 and include_descendants = 1
) and c.invalid_reason is null
) E ON I.concept_id = E.concept_id
WHERE E.concept_id is null
) C;
In the ohdsi tools, however, we don’t usually have the webapi schema ‘joinable’ to the CDM schema, so when we have to substitute in a template the list of concept_ids that are included/excluded and using descendants (and put it inside a CTE for a query:
with cteConceptSet (concept_id) as
(
select distinct I.concept_id
FROM (
select concept_id from @cdm_database_schema.CONCEPT where concept_id in ({concept_ids that are not excluded}) and invalid_reason is null
UNION select c.concept_id
from @cdm_database_schema.CONCEPT c
join @cdm_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in ({concept_ids that are not excluded and include_descendants})
and c.invalid_reason is null
) I
LEFT JOIN
(
select concept_id
from @cdm_database_schema.CONCEPT
where concept_id in ({concept_ids that are excluded}) and invalid_reason is null
UNION
select c.concept_id
from @cdm_database_schema.CONCEPT c
join @cdm_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in ({concept_ids that are excluded and include descendant})
and c.invalid_reason is null
) E ON I.concept_id = E.concept_id
WHERE E.concept_id is null
)
select ....
from {domain_table} t
join cteConceptSet cs on t.domain_concept_id = cs.concept_id
One last note: the above doesn’t show you how to use the ‘include mapped’ flag from the concept sets. This just makes the query much longer and pulling in the mapped concepts is a bit of a one-off use case, so it’s not shown here.
Also, you can create a concept set in a cohort definition, save it, and then view the generated SQL to see the above query. I pulled the above as a template from one of the cohort definitons on ohdsi.org: http://www.ohdsi.org/web/atlas/#/cohortdefinition/25724