Interpreting pathway analysis tables under results schema

SELVA_MUTHU_KUMARAN · March 1, 2020, 2:12pm

Hello Everyone,

I recently tried out the “Cohort pathways” feature in Atlas and it’s cool. Thank you Atlas team. However as I would like to investigate the subjects under each pathway, I have few questions on the results stored under pathway_analysis_ * tables. Just would like to confirm my understanding on few things. Can help us with this?

Pathway_analysis_stats

This table just contains our cohort id,its count and pathway count. If we keep modifying/editing the same target cohort again and again, we just have to pick the recent version. Am I right to understand that there is nothing more to this table? But is the Pathway_analysis_generation_id a sequence number? How is it generated and why do I see some break in generation_ids as shown below?

Pathway_analysis_codes

I understand that here we have to filter the table based on our “pathway_analysis_generation_id” but may I know

a) what does code column mean here? I mean I understand it is used to indicate the drugs but how is it obtained? because they aren’t concept_ids

b) Is_combo - I believe it is just about indicating whether a medication is appearing in combination with any other medication or not. 1 for combination and 0 for not.

Pathway_analysis_events

a) Here “combo_id” column is same as “code” and “ordinal” column can have a maximum of the “maximum path length” that we set during design phase. Meaning path length of 3 indicates, ordinal column can only have a maximum value of 3 in the generated data.

b) “subject_id” and “ordinal” together gives us the info on number of events a subject was part of. If I would like filter patients based on events, I have to use the code/combo_id? Am I right to understand this?

pathway_analysis_paths

a) Again, here steps indicate the path length. And nothing new here. Since I have set a path length of 3, the data would only be present till Step 3 and rest all would be empty. Am I right? So I can just confidently skip looking at the rest of steps?

May I know whether is there any R package to run this? If yes, how different it is from in Atlas? Meaning the main part of cohort pathways is it’s visualization which is best shown in Atlas. if R package is present, is it used to present results in tabular form? Just trying to understand

Chris_Knoll · March 2, 2020, 4:11pm

Yes, the pathway_anlaysis_stats table is simply a summary statistic table. The pathway_analysis_generation_id is an identifier that is generated from our ‘batch job’ subsystem which manages all of our background tasks (cohort generation, characterization, incidence rates, pathway analysis). The reason you see gaps in this table is that the same sequencer for generations is used across the different generation types. So, the missing values you see were used in some other generation (like cohort generation).

This table is a lookup table to give you the name of the event cohort or the combinations of event cohorts. So you join the code column with the combo_id column of pathway_analysis_events. the is_combo flag tells you that this code represents a combination of multiple event cohort codes. This is so you can easily find the stand-alone event cohorts vs. the combination cohorts.

The way these are defined are as follows:
The event cohorts are ordered, and indexed starting at 0.
The stand-alone event cohort codes are calculated as POWER(index,2), ie 1,2,4,8,16,32…
The cohort events are constructed and split up to determine the overlapping periods. See this post for details on that.
To ‘combine’ the event cohorts into a new combo_id, we SUM(combo_id) group by person_id, start_date, end_date. This results in a binary addition of the different powers-of-two comboIDs such that if you have the following combo IDs from 2 event cohorts:

Combo Example
comboId	Event Cohort 1	Event Cohort 2	Combo Name
1	Yes	No	Event Cohort 1
2	No	Yes	Event Cohort 2
3	Yes	Yes	Event Cohort 1 + Event Cohort 2

If you work this out in binary, Event Cohort 1 is 01, EC2 is 10, combining those together (via adding them together) results in 1+2 = 3 = 11 (in binary). We leverage this function of binary addition to create the combos.

This is the raw pathway event table which tells you for a given person which event cohort appeared in which order and what combinations were present at the time. you can use it to filter on specific people and combos.

Pathway_analysis_paths simply takes the data from pathway_analysis_events and makes a ‘wide’ table (up to a path length of 10). this is just for simplicity of retrieving the data for the analysis. On large cohorts, this is actually quite expensive, so we build this ‘report-ready’ table from the raw events. You are correct that if you only have max of 3 path-length, then only up to step_3 will be populated.

This is no R package to execute this, since all this functionality is bundled together with the WebAPI Java package. However, it wouldn’t be unreasonable to split off the functionality of pathway analysis into a stand-alone Java library, wich could be invoked in R or Java (ie; a dependency for WebAPI).

Akshay · August 11, 2020, 10:10am

@Chris_Knoll - At our site, we tried running cohort pathway analysis. I understand your point where stand-alone event cohorts are 0,2,4,8,16 etc. But may I check when can there be a break in event cohorts? Why don’t we see 2,4,8,16?

If you see the below screenshot, those are missing.

Chris_Knoll · August 11, 2020, 11:05pm

It’s a bug which will be addressed in 2.8, closed in this issue.