Understanding image feature, observation, and measurement tables

pwrightkcl · August 5, 2025, 1:04pm

I am working on populating the Medical Imaging CDM tables from DICOM metadata and radiology reports. We have simplified the reports into some basic categories, e.g.:

No finding
Pleural effusion and oedema
Pleural effusion but no oedema

I think I can represent these as observations and have the value as “yes/no” or “present/absent”, but I’m confused about where everything should go.

Taking the third example, I think I need two features, each with anatomic site concept for lung, one with image_feature_concept_id “pleural effusion” (254061), the other “pulmonary edema” (4078925). These would then link to observations with the same observation_concept_id and value_as_concept_id “present” (4181412) and “absent” (4132135) respectively.

Please advise if this is correct.

I have uncertainty about whether to use observation or measurement (or even condition) table, whether SNOMED or LOINC, and if I need a particular domain (like “answer” rather than “condition”).

There are also concepts like “No edema present” (4059917), which I could maybe just use as image_feature_concept_id and not need a linked observation or measurement.

All advice welcome. As a relative n00b, no answer is too simple or obvious.

Many thanks,
Paul

Christian_Reich · August 5, 2025, 7:49pm

Hi @pwrightkcl:

Interpretations of imaging procedures are typically not Observations but Conditions. For example, the patient has a pleural effusion, and the domain the concept 254061 says it is is Condition. Don’t bury these in the wrong table in value_as_concept_id, nobody will ever even look for them there.

Also, don’t record the negative results (e.g. 4059917). For routine care and diagnosing individual patients that is important. For observational research we have a Closed World system and expect only facts in the records.

kyulee.jeon · August 6, 2025, 4:18am

Hi Paul,

Thank you for your question. First, just to clarify: the Medical Imaging CDM (MI-CDM) tables are designed primarily for DICOM files as the source data. So when working with both DICOM metadata and radiology reports, it’s important to treat them as distinct data sources.

Let me break it down into two components:

1. The Image Itself (from DICOM files)

Each series of images should be stored in the image_occurrence table.
Features directly extracted from the DICOM file (e.g., metadata elements or quantitative imaging measurements) should go into the measurement (or other clinical data) table.
These features can then be linked via the image_feature table, which connects a measurement (or other clinical data) to a corresponding image_occurrence.

Important: You cannot use image_feature_concept_id alone in the image_feature table without a linked record in another clinical data table (like measurement). This is because the purpose of image_feature is to link clinical data to imaging data, not to stand alone.

2. Radiology Reports

The original report text should be stored in the note table.
If the report is simplified using NLP, extracted findings should be stored in the note_nlp table.
For extracted features like pleural effusion:
- The measurement table could be used to store the feature as a structured finding using an appropriate measurement_concept_id. (If the feature can be linked to a specific image series, you can also use the image_feature table to associate the measurement with the corresponding image_occurrence.)
  
  image1040×169 6.38 KB
- If the feature reflects a confirmed diagnosis (e.g., confirmed pleural effusion), it can also be stored in the condition_occurrence table using a condition_concept_id.
  
  image709×67 2.83 KB

By populating the measurement table, you can also capture negative findings such as “no pleural effusion”, which would not be possible with condition_occurrence alone.
However, since measurement and condition use different concept sets, care must be taken to map them correctly and avoid duplicate or inconsistent entries across the tables.

As suggested by Christian, storing confirmed diagnoses in condition_occurrence is likely the most suitable approach for downstream research.

Observation vs Measurement Table

You mentioned uncertainty about whether to use observation, measurement, or condition. According to OMOP CDM conventions:

“If the generation of clinical facts requires a standardized testing such as lab testing or imaging and leads to a standardized result, the data item is recorded in the MEASUREMENT table.”

So in this case, the observation table is not suitable. That table is typically used for things like social history, lifestyle, or patient-reported factors.

Standardized Vocabulary

For vocabulary, it’s best to use OMOP standard concepts (with concept_class = ‘S’) available in ATHENA, and match the vocabulary to the domain of the table:

I hope this clarifies the structure and intent of MI-CDM. Happy to help further if you have more questions!

Best regards,
Kyulee

pwrightkcl · August 6, 2025, 12:15pm

Thank you both for your responses. I still have some remaining questions.

Can the condition_occurrence table be linked to an image_feature in the same way as measurement and observation?
If I take @kyulee.jeon 's approach and use both condition_occurrence and measurement/observation so the latter can represent negative findings (which are important for my use case), I’m unsure how to choose the right vocabulary.
a) “Pleural effusion” is a measurement in the NAACCR vocab with positive and negative values, but does this apply to general radiology or only histopathology for cancer?
b) Likewise, “Pulmonary edema” is in LOINC as a measurement value, but only as an answer to “Event description adverse event MERSTH”, which I think is a specific test battery, so can that apply?
How should I represent a normal report? I found the following, or maybe there’s another I didn’t find.
a) “Plain X-ray of chest normal” (condition)
b) “Normal lung” (observation)
c) “No abnormality” (observation which would be linked to the anatomy field in the imaging_feature table)
What is the correct way to associate multiple anatomic labels to an image? I am trying to represent the regions imaged according to DICOM StudyDescription and BodyPartExamined, not findings from a report. If a “naked” image_feature record isn’t allowed, I could link each to a measurement record with the DICOM source, but that could create duplicates, and would be unlikely to be used.
Following the above, I re-read the paper and noticed the anatomy field in image_occurrence should have the “lowest level of granularity”, which I initially read as most detailed, but could be read as most general, which would make more sense. Which is intended?

Just to clarify on the negative findings, we will often be using our OMOP database for scoping a potential cohort. These may require certain conditions to have been ruled out radiologically (e.g. MR head shows acute infarct but no chronic infarct was found). Radiological reports often specify that certain findings were looked for and not observed, and I want the database to distinguish between that and a finding simply not being mentioned.

Thank you for your guidance.

Paul

kyulee.jeon · August 11, 2025, 2:45am

Hi Paul,

Thank you for your follow-up questions. I’ve shared a few of my thoughts below, though for some points I think it would be valuable to also hear perspectives from others in the imaging WG.

1. Linking `condition_occurrence` to `image_feature`

It is technically possible to link a condition_occurrence record to an image_feature in the same way as with measurement or observation, if the condition can be clearly tied to a specific image series (in DICOM’s hierarchy: Person → Study → Series → Instance, where the MI-CDM image_occurrence table’s basic unit is the Series).

For findings derived solely from image interpretation (including negative findings), it will be better stored in measurement or observation, while confirmed diagnoses in condition_occurrence could be linked to image_occurrence at a higher level (e.g., via the visit or episode table).

2. Vocabulary Selection

The NAACCR and LOINC examples are oncology- or test-specific, so they may not always apply to general radiology use cases.

In the MI-CDM paper (Park et al.), the proposed approach was:

RadLex and LOINC for radiological findings and measurements
SNOMED CT for anatomical location

RadLex offers rich hierarchies for imaging findings, for example:
Clinical finding → Pathophysiologic finding → Mechanical disorder → Fluid disorder → Effusion → Pleural effusion (RID34539).

Another potential resource is RadLex Common Data Elements (CDEs), which define standardized “key–value” pairs for structuring radiology reports. They are not yet OMOP standard concepts, but could be relevant for your use case.

The OHDSI Medical Imaging Workgroup (WG) is actively discussing how to incorporate such imaging vocabularies into OMOP, and your scenario could contribute meaningfully to that conversation. I’d encourage you to join a meeting so we can explore it together.

3. Anatomical Location Representation

For anatomical regions imaged, the MI-CDM image_occurrence.anatomic_site_concept_id should follow the “lowest level of granularity” from DICOM’s Defined Terms or Anatomical Region codes (often drawn from SNOMED CT or LOINC).

The DICOM2OMOP paper has mapped these DICOM anatomical terms into OMOP vocabularies, with the corresponding lists available on GitHub. In practice, I interpret “lowest level of granularity” as the most specific term available, but this might be worth confirming with the WG as interpretations could vary.

For the vocabulary and note-related topics, it would be great to continue the discussion in the WG. Meetings are biweekly on Wednesdays at 7 AM and 7 PM ET, with the next meeting scheduled for Wednesday, September 3 at 7 AM ET. I hope to see you there.

pwrightkcl · November 5, 2025, 5:37pm

I wanted to add a small update here, since it might be useful to people not in the Medical Imaging WG searching online. I also thought it would be good to link in this previous post from user @Jgallo, even though it is from 2021.

I’m still looking into how I want to represent findings from the radiology report. I’ve noted the recommendation to use Radlex codes, although these are not yet in Athena, so I’m not yet ready to create custom vocabulary tables for our implementation (though that will have to come at some point if we want to represent DICOM tags).

It sounds like we shouldn’t use the condition_occurrence table for radiological findings, particularly negative findings (which do need to be represented, as they are a positive fact). I’m assuming that there can’t be an exception to the rule that the observation table cannot include concepts from the condition domain (which would be the easiest way to pair a condition with an assertion or negation). So, I thought we might use the Morph Abnormality class in observation_class_id, in conjunction with image_feature.anatomic_site_concept_id.

Examples:

Report phrase	Anatomic site	Observation
“pleural effusion”	Both lungs (4250192)	Effusion (4215818)
“congestive cardiac failure”	Heart structure (4217142)	Congestive hypertrophy (4298307)

This approach feels a bit hacky, because we have the exact concepts usually appearing in the radiology report text as conditions, and not all combinations to represent them in terms of morphological abnormalities might be available.

There are also codes like 4236310 “Computed tomography of brain abnormal”, which are clearly radiological findings, but are in the condition domain.

A final thing that isn’t clear is the relationship between an image_occurrence and either the study or series in DICOM. In the paper, it says it can be either. Radiology reports often combine information from multiple series in a study (e.g. in MRI with multiple sequences, or CT pre- and post-contrast). In this case we would want to link the findings to the study level. Is it possible to do this? E.g. one could have one image_occurrence entry for the study, and then additional entries for each series, with image_series_UID blank for the study. Or has it been settled in the WG since the paper came out to only use series?

Thanks @kyulee.jeon and others for help, and I hope this might be useful to other forum users.