Some differences between MS SQL cohort generation query in Atlas 2.7.0 and Atlas 2.8.0

hui_xing_tan · March 15, 2021, 2:15am

Hi all,
Just to share some differences between the MS SQL queries in Atlas 2.7.0 and Atlas 2.8.0 that you may encounter:

“invalid_reason” IS NULL in SELECT statement

Present in Atlas 2.8.0 only for the descendant concepts, e.g.

select concept_id from @vocabulary_database_schema.CONCEPT where concept_id in (314666)
UNION select c.concept_id
from @vocabulary_database_schema.CONCEPT c
join @vocabulary_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (314666)
and c.invalid_reason is null

Present in Atlas 2.7.0 for both main and descendent concepts, e.g.

select concept_id from @vocabulary_database_schema.CONCEPT where concept_id in (314666)and invalid_reason is null
UNION select c.concept_id
from @vocabulary_database_schema.CONCEPT c
join @vocabulary_database_schema.CONCEPT_ANCESTOR ca on c.concept_id = ca.descendant_concept_id
and ca.ancestor_concept_id in (314666)
and c.invalid_reason is null

Inserting into Codesets table

In Atlas 2.8.0, UNION ALL is used to join selection before inserting
In Atlas 2.7.0 INSERT INTO is used multiple times

Begin Drug Exposure criteria/ Condition occurrence criteria

Additional columns are selected in Atlas 2.8.0

C.drug_exposure_start_date as sort_date
C.condition_start_date as sort_date

No additional columns are selected in Atlas 2.8.0

The differences are quite minor and would probably produce the same cohort counts in either case. However, if your invalid_reason is blank instead of ‘null’, like in our experience, you may encounter the issue that the 2.8.0 version produces nonzero cohort counts while the 2.7.0 version produces 0 count. In that case, do make sure the invalid_reason is null and not blank before running the cohort generation, and it should be fine.

Tagging @Chris_Knoll as well - as always thanks for your help with Atlas set up

Thanks!

Chris_Knoll · March 15, 2021, 4:25pm

Great summary, I can try to provide rationale here:

There was some discussion using invalid_reason for concepts, and the use of the invalid_reason is a bit confusing for concepts (compared to concept relationships). It was decided that if you put a concept into the concept set expression you should get it back (that’s the first UNION). To limit changes, we did not remove the IS_NULL check from the second UNION because the second union relates to descendants, and there should never be invalid concepts int he concept_ancestor table, so this IS_NULL is redundant. To your point, you need to import your concepts correctly (ie: load the NULL values as NULL)

There was an issue related to google big query where you were charged ‘by the query’, and therefore as a cost saving, those individual inserts were changed to UNION ALLs.

I think you meant to say that there were columns selected in 2.7 but not selected in 2.8…there was a minor refactoring done in CIRCE to support ‘distribution queries’ for characterization, where we moved those selections to another part of the query, but the end results were validated to returnt he same results in our unit tests, so there should be no issue here.

Thanks again for your summary, it’s very good!