Thanks, Christian, for the detailed response.
I think I should start by specifying my use case. This project uses OMOP for cohort discovery, not for population health research. We are building a system that lets the user enter cohort queries and discover sets of images that meet the inclusion and exclusion criteria for their project. (These are then used to train models using federated learning or for federated analytics for study feasibility; the key advantage of OMOP-ifying the data is interoperability between federated sites.) The prototype version just had the person and radiology tables (C Park, 2022) with condition codes in radiology_note. We are now implementing a more complete OMOP with data from EHR as well as image metadata per WY Park 2024. This implementation of OMOP will be related to the one we use for population health but will be a specialised implementation only used within our system for imaging work.
I need to enable queries that:
- Link a condition / pathology / finding to an image
- Identify cases where a condition has been ruled out radiologically
- Locate a condition anatomically (e.g. atelectasis of the left upper lobe; cerebral infarction in the right parietal lobe)
After chewing on this a while, it seems my two big questions are:
- Which concept codes to use to represent radiological findings:
- Radlex entities in a custom vocab only (or mainly) linked via
image_feature
- Standard SNOMED codes, which are (mostly?) in the
condition domain
- Which table to put these codes in:
image_feature linking to measurement or observation
condition_occurrence
note and note_nlp referring to the radiological report text and specific entities extracted from it by NLP
The first question may depend on the use case. In the 2024 paper, the example includes specific measurements of the size of a lung nodule, and they used a custom Radlex concept set to represent this in image_feature and measurement. If the use case is just the finding and its anatomical location, but not a quantitative measurement, then I’m not sure what is best, but lean towards standard condition concepts. I haven’t explored the content of the Radlex lexicon vs the SNOMED standard conditions to get a sense of their scope, but it seems like Radlex may more explicitly represent a finding of “normal anatomy”.
Based on what I’ve figured out from the CDM definition, and bearing in mind I’m an imager and haven’t used OMOP for anything non-imaging so may have missed deeper lore, here’s my take on the pros and cons of each table choice, assuming I want to use standard condition concepts.
| Approach |
Pro |
Con |
image_feature and measurement or observation * |
Finding’s provenance is linked to a specific image. Granular anatomic labels are linked to each finding. Allows negation by assigning codes for yes or no to the value field. |
The CDM says condition concepts must go in the condition_occurrence table, not measurement or observation. Findings from radiological reports often refer to the entire study, i.e. evidence from several different images acquired at the same time. |
condition_occurrence |
The expected place for condition concept codes, so people will know where to look for them. |
The condition_occurrence table is linked to person and visit_occurrence so there is no direct provenance to an image other than indirectly by date. Does not allow negation. ** I’m not sure anatomy can be linked to the condition, unless both are combined in a single concept code. |
note_nlp |
The expected place for findings derived from reports by NLP. Links to procedure_occurrence, which established the provenance to an imaging study. The procedure corresponds to the DICOM study level, which is the appropriate level for a radiology report (although the report may also refer to specific series / images). Allows negation in the term_modifer field. |
I’m not sure anatomy can be linked to the condition, unless both are combined in a single concept code. |
* I think the image_feature table can also link out to condition_occurrence via image_feature_event_field_concept_id and image_feature_event_id as per Kyulee’s post, but I’m not sure what the implications are if I do that. Is there some sense of directionality? It seems measurement and observation are obviously derived from the image, but maybe the opposite for condition occurrence, since it may span dates outside the image, so the image contributes to or exemplifies the condition, rather than giving rise to it.
** It’s possible I could represent negative findings in condition_occurrence using the Imaging result normal family of concepts. This could cause contradictory entries in condition_occurrence if, say, the left lung was normal but right lung had an abnormality, since the findings would only be associated with anatomy in image_feature.
Of course, the whole reason I’m here on the thread is that I need to know where I’ve got things wrong, so I’m totally open to correction / new info. I also understand that the imaging parts of OMOP are in active development, so there may be several views on this. I’ll be discussing this at the WG meeting too, but thought it’s also useful to have the discussion searchable online.
More generally, I note the condition_occurrence table describes a period of time where the patient has a diagnosed condition, with a start and end date rather than a point in time. This diagnosis might be supported by multiple pieces of evidence, but I think one record in the table is intended to represent a confirmed diagnosis, rather than the specific investigations it was based on. This is more of a theoretical question then applicable to my use case, but I’d like to know if I’m understanding the design intent.