Friends:
We’ve recently come across an issue that has a potential to evolve into a big discussion, so we wanted to bring it up and ask for your advice.
I know well enough (and spread this knowledge) that we don’t store negative information in OMOP (e.g. absence of a disorder) and create our cohorts based on the absence of events. While it works for regular non-ideal datasets, it looks inappropriate for a well-structured and meticulous dataset we’re working with.
So, the issue:
records with answers yes/no/unknown, e.g. metastasis to lung. We’d typically map the first reply to metastasis to lung and throw the other two away. In this case, it’s impossible to distinguish between the cohort of people that are more or less believed to be metastasis-free and the people we know nothing about. Moreover, we have a date when an organization started to collect this information.
So, imagine it’s 2010 for lung metastasis. We know nothing about the dataset, run a network study and include all patients that have no mention of mts into our control cohort. We will then assume that people didn’t have mts before 2010 at all!
The possible solutions I could think of:
- Add status/modifier/etc to indicate the absence of condition/procedure/measurement, put in the respective tables.
- will be hard to track as different tables have different columns to store the info like this;
- Add a custom column to each table in CDM so that it will be a flag column.
- will cause false-positive results in Atlas, won’t help unless you know this feature;
- Create an Achilles Heel report that will warn you if an event occurred for the first time after a long period of time.
And a broader question: should we correct our study design based on the features of a dataset?
Would love to hear your comments.