OHDSI Home | Forums | Wiki | Github

How to debug cohort generation for Truven Marketscan Medicare data?

Expected behavior

  • more than 1 person in a large cohort

Actual behavior

  • 1 person in a cohort

Steps to reproduce behavior

Use Atlas to generate this cohort:
delivery-hospitalizations.txt
On the Truven Marketscan Medicare data source:

Note that there are 100k+ people in the other data sources, but only a single person in Medicare.

How would you debug this?

Any help would be appreciated. @toekneesunshine and I looked at the exported SQL but it is messy, and all the concept sets are compressed into a single line, so something like a stack trace or binary search to comment out parts of the query will be difficult. Has anyone debugged errors like this that are data source dependent?

Alternatively, is there a point person at Truven to ask about potential preprocessing in the data that is different than the Multi-state Medicaid data, leading to only a single person in this cohort?

Thanks!
Jaan

cc @karthik, @t_abdul_basser, @noemie

Hi @jaan , it looks like your cohort is looking for pregnant women. Note the ibm marketscan medicare database only contains retirees who opt for supplemental medicare coverage, so this is almost all >65 years old, so we wouldnt expect to see pregnant women in there.

Thanks @Patrick_Ryan – that’s good to know!

That is very confusing because there is another cohort generated on the same data with 26k pregnancy-related outcomes (severe maternal morbidity).

Cohort for severe maternal morbidity with delivery hospitalization:
severe-maternal-morbidity-json.txt

There are 26k people who match this criteria on the Truven Medicare data:

image

Any advice on how to compute the diff between these two cohorts to find what the bug is?

Thanks for sharing your JSON, it makes it easy to diagnosis what may be going on. A quick look, I think it has to do with your first entry event: ‘SMM 17-21 procedure indicators’, which appears to have some codes that wouldn’t be exclusive to a pregnancy episode, and these procedures aren’t being restricted to either an inpatient visit or requiring a condition indicating pregnancy.

Thanks @Patrick_Ryan – why would these codes be different across Truven Medicare, Multi-State Medicaid, CCAE and CUMC data?

MDCD, MDCR, and CCAE will likely have similar diagnosis codes, but the impact on pregnancy-related codes will be quite different (since MDCR generally shouldn’t have pregnancies).

CUIMC will have different codes, given that its EHR and not restricted only to ICD-based billing codes.

Thank you @Patrick_Ryan that is very helpful.

Would a histogram of codes in the cohort definition across MDCD, MDCR, CCAE, and CUIMC help understand whether the pregnancy-related codes or the lack of ICD-based billing codes in some data sources is the issue?

I am also wondering whether looking at the ETL pipeline for these data sources, to see how the data is loaded into Atlas could help understand why MDCR only has a single patient compared to the others.

t