Understanding image feature, observation, and measurement tables

I am working on populating the Medical Imaging CDM tables from DICOM metadata and radiology reports. We have simplified the reports into some basic categories, e.g.:

  • No finding
  • Pleural effusion and oedema
  • Pleural effusion but no oedema

I think I can represent these as observations and have the value as “yes/no” or “present/absent”, but I’m confused about where everything should go.

Taking the third example, I think I need two features, each with anatomic site concept for lung, one with image_feature_concept_id “pleural effusion” (254061), the other “pulmonary edema” (4078925). These would then link to observations with the same observation_concept_id and value_as_concept_id “present” (4181412) and “absent” (4132135) respectively.

Please advise if this is correct.

I have uncertainty about whether to use observation or measurement (or even condition) table, whether SNOMED or LOINC, and if I need a particular domain (like “answer” rather than “condition”).

There are also concepts like “No edema present” (4059917), which I could maybe just use as image_feature_concept_id and not need a linked observation or measurement.

All advice welcome. As a relative n00b, no answer is too simple or obvious.

Many thanks,
Paul

Hi @pwrightkcl:

Interpretations of imaging procedures are typically not Observations but Conditions. For example, the patient has a pleural effusion, and the domain the concept 254061 says it is is Condition. Don’t bury these in the wrong table in value_as_concept_id, nobody will ever even look for them there.

Also, don’t record the negative results (e.g. 4059917). For routine care and diagnosing individual patients that is important. For observational research we have a Closed World system and expect only facts in the records.

2 Likes

Hi Paul,

Thank you for your question. First, just to clarify: the Medical Imaging CDM (MI-CDM) tables are designed primarily for DICOM files as the source data. So when working with both DICOM metadata and radiology reports, it’s important to treat them as distinct data sources.

Let me break it down into two components:


1. The Image Itself (from DICOM files)

  • Each series of images should be stored in the image_occurrence table.
  • Features directly extracted from the DICOM file (e.g., metadata elements or quantitative imaging measurements) should go into the measurement (or other clinical data) table.
  • These features can then be linked via the image_feature table, which connects a measurement (or other clinical data) to a corresponding image_occurrence.

Important: You cannot use image_feature_concept_id alone in the image_feature table without a linked record in another clinical data table (like measurement). This is because the purpose of image_feature is to link clinical data to imaging data, not to stand alone.


2. Radiology Reports

  • The original report text should be stored in the note table.

  • If the report is simplified using NLP, extracted findings should be stored in the note_nlp table.

  • For extracted features like pleural effusion:

    • The measurement table could be used to store the feature as a structured finding using an appropriate measurement_concept_id. (If the feature can be linked to a specific image series, you can also use the image_feature table to associate the measurement with the corresponding image_occurrence.)

    • If the feature reflects a confirmed diagnosis (e.g., confirmed pleural effusion), it can also be stored in the condition_occurrence table using a condition_concept_id.

By populating the measurement table, you can also capture negative findings such as “no pleural effusion”, which would not be possible with condition_occurrence alone.
However, since measurement and condition use different concept sets, care must be taken to map them correctly and avoid duplicate or inconsistent entries across the tables.

As suggested by Christian, storing confirmed diagnoses in condition_occurrence is likely the most suitable approach for downstream research.


Observation vs Measurement Table

You mentioned uncertainty about whether to use observation, measurement, or condition. According to OMOP CDM conventions:

If the generation of clinical facts requires a standardized testing such as lab testing or imaging and leads to a standardized result, the data item is recorded in the MEASUREMENT table.

So in this case, the observation table is not suitable. That table is typically used for things like social history, lifestyle, or patient-reported factors.


Standardized Vocabulary

For vocabulary, it’s best to use OMOP standard concepts (with concept_class = ‘S’) available in ATHENA, and match the vocabulary to the domain of the table:


I hope this clarifies the structure and intent of MI-CDM. Happy to help further if you have more questions!

Best regards,
Kyulee

3 Likes

Thank you both for your responses. I still have some remaining questions.

  1. Can the condition_occurrence table be linked to an image_feature in the same way as measurement and observation?
  2. If I take @kyulee.jeon 's approach and use both condition_occurrence and measurement/observation so the latter can represent negative findings (which are important for my use case), I’m unsure how to choose the right vocabulary.
    a) “Pleural effusion” is a measurement in the NAACCR vocab with positive and negative values, but does this apply to general radiology or only histopathology for cancer?
    b) Likewise, “Pulmonary edema” is in LOINC as a measurement value, but only as an answer to “Event description adverse event MERSTH”, which I think is a specific test battery, so can that apply?
  3. How should I represent a normal report? I found the following, or maybe there’s another I didn’t find.
    a) “Plain X-ray of chest normal” (condition)
    b) “Normal lung” (observation)
    c) “No abnormality” (observation which would be linked to the anatomy field in the imaging_feature table)
  4. What is the correct way to associate multiple anatomic labels to an image? I am trying to represent the regions imaged according to DICOM StudyDescription and BodyPartExamined, not findings from a report. If a “naked” image_feature record isn’t allowed, I could link each to a measurement record with the DICOM source, but that could create duplicates, and would be unlikely to be used.
  5. Following the above, I re-read the paper and noticed the anatomy field in image_occurrence should have the “lowest level of granularity”, which I initially read as most detailed, but could be read as most general, which would make more sense. Which is intended?

Just to clarify on the negative findings, we will often be using our OMOP database for scoping a potential cohort. These may require certain conditions to have been ruled out radiologically (e.g. MR head shows acute infarct but no chronic infarct was found). Radiological reports often specify that certain findings were looked for and not observed, and I want the database to distinguish between that and a finding simply not being mentioned.

Thank you for your guidance.

Paul

Hi Paul,

Thank you for your follow-up questions. I’ve shared a few of my thoughts below, though for some points I think it would be valuable to also hear perspectives from others in the imaging WG.

1. Linking condition_occurrence to image_feature

It is technically possible to link a condition_occurrence record to an image_feature in the same way as with measurement or observation, if the condition can be clearly tied to a specific image series (in DICOM’s hierarchy: Person → Study → Series → Instance, where the MI-CDM image_occurrence table’s basic unit is the Series).

For findings derived solely from image interpretation (including negative findings), it will be better stored in measurement or observation, while confirmed diagnoses in condition_occurrence could be linked to image_occurrence at a higher level (e.g., via the visit or episode table).

2. Vocabulary Selection

The NAACCR and LOINC examples are oncology- or test-specific, so they may not always apply to general radiology use cases.

In the MI-CDM paper (Park et al.), the proposed approach was:

  • RadLex and LOINC for radiological findings and measurements
  • SNOMED CT for anatomical location

RadLex offers rich hierarchies for imaging findings, for example:
Clinical finding → Pathophysiologic finding → Mechanical disorder → Fluid disorder → Effusion → Pleural effusion (RID34539).

Another potential resource is RadLex Common Data Elements (CDEs), which define standardized “key–value” pairs for structuring radiology reports. They are not yet OMOP standard concepts, but could be relevant for your use case.

The OHDSI Medical Imaging Workgroup (WG) is actively discussing how to incorporate such imaging vocabularies into OMOP, and your scenario could contribute meaningfully to that conversation. I’d encourage you to join a meeting so we can explore it together.

3. Anatomical Location Representation

For anatomical regions imaged, the MI-CDM image_occurrence.anatomic_site_concept_id should follow the “lowest level of granularity” from DICOM’s Defined Terms or Anatomical Region codes (often drawn from SNOMED CT or LOINC).

The DICOM2OMOP paper has mapped these DICOM anatomical terms into OMOP vocabularies, with the corresponding lists available on GitHub. In practice, I interpret “lowest level of granularity” as the most specific term available, but this might be worth confirming with the WG as interpretations could vary.


For the vocabulary and note-related topics, it would be great to continue the discussion in the WG. Meetings are biweekly on Wednesdays at 7 AM and 7 PM ET, with the next meeting scheduled for Wednesday, September 3 at 7 AM ET. I hope to see you there.

2 Likes

I wanted to add a small update here, since it might be useful to people not in the Medical Imaging WG searching online. I also thought it would be good to link in this previous post from user @Jgallo, even though it is from 2021.

I’m still looking into how I want to represent findings from the radiology report. I’ve noted the recommendation to use Radlex codes, although these are not yet in Athena, so I’m not yet ready to create custom vocabulary tables for our implementation (though that will have to come at some point if we want to represent DICOM tags).

It sounds like we shouldn’t use the condition_occurrence table for radiological findings, particularly negative findings (which do need to be represented, as they are a positive fact). I’m assuming that there can’t be an exception to the rule that the observation table cannot include concepts from the condition domain (which would be the easiest way to pair a condition with an assertion or negation). So, I thought we might use the Morph Abnormality class in observation_class_id, in conjunction with image_feature.anatomic_site_concept_id.

Examples:

Report phrase Anatomic site Observation
“pleural effusion” Both lungs (4250192) Effusion (4215818)
“congestive cardiac failure” Heart structure (4217142) Congestive hypertrophy (4298307)

This approach feels a bit hacky, because we have the exact concepts usually appearing in the radiology report text as conditions, and not all combinations to represent them in terms of morphological abnormalities might be available.

There are also codes like 4236310 “Computed tomography of brain abnormal”, which are clearly radiological findings, but are in the condition domain.

A final thing that isn’t clear is the relationship between an image_occurrence and either the study or series in DICOM. In the paper, it says it can be either. Radiology reports often combine information from multiple series in a study (e.g. in MRI with multiple sequences, or CT pre- and post-contrast). In this case we would want to link the findings to the study level. Is it possible to do this? E.g. one could have one image_occurrence entry for the study, and then additional entries for each series, with image_series_UID blank for the study. Or has it been settled in the WG since the paper came out to only use series?

Thanks @kyulee.jeon and others for help, and I hope this might be useful to other forum users.

@pwrightkcl:

Thanks for moving this forward. However, we need to make sure we keep the explicit and implicit context of the OMOP tables intact:

Why not? Most of them are declarations of conditions, signs or symptoms found in the image.

Actually, no. You are right that in the diagnostic work-up for an individual patient they are necessary to exclude all possibilities. But RWE doesn’t treat individual patients, but works on the Closed World assumption, which means we only have records of positive facts. What patients don’t have is conspicuous by absence.

Technically, you could do that, because the OBSERVATION table can be post-coordinated with the negation. But you won’t do yourself any favor, because no analytic is going to look for such thing. Analytics are looking for the facts, and interpret the absence of facts as the negative.

Why wouldn’t we put Pleural effusion and Congestive heart failure into the CONDITION table?

What is the analytical use case of this? Do you want to study the effectiveness of imaging?

Thanks, Christian, for the detailed response.

I think I should start by specifying my use case. This project uses OMOP for cohort discovery, not for population health research. We are building a system that lets the user enter cohort queries and discover sets of images that meet the inclusion and exclusion criteria for their project. (These are then used to train models using federated learning or for federated analytics for study feasibility; the key advantage of OMOP-ifying the data is interoperability between federated sites.) The prototype version just had the person and radiology tables (C Park, 2022) with condition codes in radiology_note. We are now implementing a more complete OMOP with data from EHR as well as image metadata per WY Park 2024. This implementation of OMOP will be related to the one we use for population health but will be a specialised implementation only used within our system for imaging work.

I need to enable queries that:

  • Link a condition / pathology / finding to an image
  • Identify cases where a condition has been ruled out radiologically
  • Locate a condition anatomically (e.g. atelectasis of the left upper lobe; cerebral infarction in the right parietal lobe)

After chewing on this a while, it seems my two big questions are:

  • Which concept codes to use to represent radiological findings:
    • Radlex entities in a custom vocab only (or mainly) linked via image_feature
    • Standard SNOMED codes, which are (mostly?) in the condition domain
  • Which table to put these codes in:
    • image_feature linking to measurement or observation
    • condition_occurrence
    • note and note_nlp referring to the radiological report text and specific entities extracted from it by NLP

The first question may depend on the use case. In the 2024 paper, the example includes specific measurements of the size of a lung nodule, and they used a custom Radlex concept set to represent this in image_feature and measurement. If the use case is just the finding and its anatomical location, but not a quantitative measurement, then I’m not sure what is best, but lean towards standard condition concepts. I haven’t explored the content of the Radlex lexicon vs the SNOMED standard conditions to get a sense of their scope, but it seems like Radlex may more explicitly represent a finding of “normal anatomy”.

Based on what I’ve figured out from the CDM definition, and bearing in mind I’m an imager and haven’t used OMOP for anything non-imaging so may have missed deeper lore, here’s my take on the pros and cons of each table choice, assuming I want to use standard condition concepts.

Approach Pro Con
image_feature and measurement or observation * Finding’s provenance is linked to a specific image.
Granular anatomic labels are linked to each finding.
Allows negation by assigning codes for yes or no to the value field.
The CDM says condition concepts must go in the condition_occurrence table, not measurement or observation.
Findings from radiological reports often refer to the entire study, i.e. evidence from several different images acquired at the same time.
condition_occurrence The expected place for condition concept codes, so people will know where to look for them. The condition_occurrence table is linked to person and visit_occurrence so there is no direct provenance to an image other than indirectly by date.
Does not allow negation. **
I’m not sure anatomy can be linked to the condition, unless both are combined in a single concept code.
note_nlp The expected place for findings derived from reports by NLP.
Links to procedure_occurrence, which established the provenance to an imaging study.
The procedure corresponds to the DICOM study level, which is the appropriate level for a radiology report (although the report may also refer to specific series / images).
Allows negation in the term_modifer field.
I’m not sure anatomy can be linked to the condition, unless both are combined in a single concept code.

* I think the image_feature table can also link out to condition_occurrence via image_feature_event_field_concept_id and image_feature_event_id as per Kyulee’s post, but I’m not sure what the implications are if I do that. Is there some sense of directionality? It seems measurement and observation are obviously derived from the image, but maybe the opposite for condition occurrence, since it may span dates outside the image, so the image contributes to or exemplifies the condition, rather than giving rise to it.

** It’s possible I could represent negative findings in condition_occurrence using the Imaging result normal family of concepts. This could cause contradictory entries in condition_occurrence if, say, the left lung was normal but right lung had an abnormality, since the findings would only be associated with anatomy in image_feature.

Of course, the whole reason I’m here on the thread is that I need to know where I’ve got things wrong, so I’m totally open to correction / new info. I also understand that the imaging parts of OMOP are in active development, so there may be several views on this. I’ll be discussing this at the WG meeting too, but thought it’s also useful to have the discussion searchable online.

More generally, I note the condition_occurrence table describes a period of time where the patient has a diagnosed condition, with a start and end date rather than a point in time. This diagnosis might be supported by multiple pieces of evidence, but I think one record in the table is intended to represent a confirmed diagnosis, rather than the specific investigations it was based on. This is more of a theoretical question then applicable to my use case, but I’d like to know if I’m understanding the design intent.

Short notes to self, maybe useful to others: looking in more detail at the 2024 paper’s example suggested two things to me.

  1. Although it says an image_occurrence could be a series or a study, the study can be represented in the procedure_occurrence table, so it makes sense just to use image_occurrence for series.
  2. On the question of what the paper means by “lowest level of granularity” they use “entire thorax” in the image_occurrence table but “lung structure” in image_feature. This aligns with the Radlex playbook way of specifying the general area imaged, and imaging focus second. So “lowest” means “coarsest” or “most general”.