OHDSI Home | Forums | Wiki | Github

How do you representation of mismatch repair status / microsatellite instability in OMOP?

Dear OHDSI community,

As a part of the HowOften study, I’ve tried creating some cohorts definied by having colorectal cancer and their mismatch repair (MMR) status or microsatelite (MS). However, this produced quite lower counts that we we were expecting. After a bit of investigation we didn’t find any smoking gun of concepts which we forgot to include. We are currelty using:

  • 35919449 MSI unstable low (MSI-L)
  • 35917471 Microsatellite Instability (MSI)
  • 35917835 Microsatellite Instability (MSI)
  • 35918368 Microsatellite Instability (MSI)
  • 21493972 Microsatellite instability [Interpretation] in Cancer specimen Qualitative
  • 35977041 Microsatellite Instable-Low (MSI-L) measurement
  • 42537577 Microsatellite instability-high colorectal cancer
  • 3173676 Intact mismatch protein repair function identified in malignant tumor
  • 21493968 DNA mismatch repair protein Mlh1 [Presence] in Cancer specimen by Immune stain
  • 21493971 Mismatch repair endonuclease PMS2 [Presence] in Cancer specimen by Immune stain
  • 21493969 DNA mismatch repair protein Msh2 [Presence] in Cancer specimen by Immune stain
  • 21493970 DNA mismatch repair protein Msh6 [Presence] in Cancer specimen by Immune stain
  • 35977040 Microsatellite Instable-High (MSI-H) measurement

Has anyone have expirience with how it might be coded differently? In Denmark we are doing the test routinly on patients diagnosed with colorectal cancer, and base on guidelines from NICE it’s also recomended in the UK and by the American College of Pathologists in the US, which probably means the difference in the counts are not due to differences in care, but hopefully due to how we’re trying to define the concept.

Kind regards,


These concepts are actually derived from the College of American Pathologists Electronic Checklists. What is it you are not finding? What are you counting?

At our institution we do not have those biomarkers in structured form, we only have them in pathology report text. I just checked our brand new CDM and it contains zero of those concepts in MEASUREMENT. We expect we will use methods from the NLP WG as well as internal efforts to recover those data elements from text but we’re not there yet.

1 Like

I realize we do have this data for internally sequenced tumors, but we haven’t loaded tumor genomics into our CDM yet. We are waiting for the KOIOS tool to be ready.

Dear @Christian_Reich,

Thank you. We are counting the number of patients included in some cohort defintions (e.g. based on the phenotype library), where we almost got not counts when including definitions on MMR status.

The thing we’re hoping at finding to figure out if anyone has expirience with identifying these patients in an alternative way, which we could use in our cohort definition, as we a priori would expect most patients with colorectal cancer to have these test performed.

Dear @jmethot,

Thank you - we’ve also considered if the low counts is caused by an unavaliability of from the source data, where some pathology reports might not end up in the OMOP instances.



Wait. Isn’t the problem that only very few databases have this type of genetic information?

This could indeed be the situation. At least we have not able to find any concepts to identify the patients. We imaged this data should be routinely avaliable from a clinical perspective common from a clinical perspective, which made us hope that it was our definition that was the problem.

Hope dies last, @awrosen. Come to the Onco WG. We are actively recruiting collaborators with that type of data. Tell them you love them.

1 Like