OMOP was initially set up to support observational research and closed world data but many of us are trying to use it to make our local hospital data accessible for the development of diagnostic and predictive AI models where we need to set up positive and negative cohorts for training. At the same time we want to support as yet undefined observational research questions.
There are a few of areas where a mismatch between these use cases seems to lead to different ways of doing ETL. In particular, I’d be interested to hear more opinions on some points related to AI and imaging.
- The use of negation- I understand why for observational studies it is only the presence of a condition or result that counts, but as has been pointed out in other posts, for areas such as pathology and radiology it is important to know that something was looked for but not found. Using question/answer pairs is a natural way of dealing with this, but atlas and other analysis tools are not set up well to exploit this structure. Should we create duplicate records for the positive findings - one using the QA structure and the other using a conventional omop approach?
- A condition mentioned in a pathology or radiology report based on a single study may not carry the same degree of certainty as a condition entered into a registry where information from many sources will have been used to arrive at a diagnosis. In AI studies, we may be interested in cases where the condition suggested in one modality is subsequently modified after additional information is collected. Should all diagnoses be treated as conditions or is possible to differentiate between confirmed and possible conditions (observations?) How would we know the difference?
I realise that having specific use cases would be ideal but in our case we can’t tailor our ETL pipelines for specific project and I lean towards a more complete representation which may make it a little more involved to query. Is this going to render our OMOP implementation too non-standard for collaboraiton?