OHDSI Home | Forums | Wiki | Github

How to transform complex entry into CDM?

We have complex entry with hierarchy:
root: lesion
features: site (skin, brain, lung), location, orientation (left, right), size (x,y), detection method
descendants: lesion progression (the same features as for root lesion), lesion recist, recurrence (the same features as for root lesion).
How can we transform it into CDM?
We want to create custom concepts with hierarchy and then use them as question and value as answer in e.g. observation table. This way question concept will be custom and answer concept will be Standard.
Eg Observation:
observation_concept_id = ‘lesion site’ (custom concept_id here)
value_as_concept_id = ‘skin’(standard concept_id here)
To have link to root lesion we will create relation in fact_relationship between root lesion and this record.
If user need to get all data and hierarchy he should follow the steps:

  1. get concepts hierarchy structure from custom vocabulary (root lesion)
  2. find root lesion in Observation
  3. get all related data from Observation/Measurement using table fact_relationship
    Is it valid approach to data transformation?


Are these cancer lesions? Or dermatological? Or infectious?

Reason I am asking is that we are solving this for cancer as we speak.

@Christian_Reich This is cancer. How are you solving this? Can you give any example, please?

In our data lesion has attributes:

  • lesion_site (brain)
  • lesion_location (cerebellar)
  • lesion_orientation (left)
  • lesion_detection_method (x-ray)
  • lesion_size
    • lesion_size_x
    • lesion_size_y
  • lesion_recist (bool)
  • lesion_recist_responce (stable desisease)
  • lesion_recist_size
    • lesion_recist_size_x
    • lesion_recist_size_y
  • lesion_recist_detection_method
    After we create records in tables (observation, measurements, ect) we link them together with fact_relationship

    What do you think about this approach?


There is an OMOP CDM Oncology Extension that is currently being worked on. We are in the process of completing some vocabulary and ETL documentation work to be able to test the extension. We are meeting weekly. Meeting info here:


Pleas join us.

Here is an overall explanation/discussion of the extension:


Here is the extension data model:

The main idea of the extension is to add an EPISODE table that can represent a base oncology diagnosis and to add modifier_of_event_id and modifier_of_field_concept_id fields to the MEASUREMENT table to allow for the refinement of an oncology diagnosis with modifying tumor characteristics (MEASUREMENT becomes a polymorphic child of EPISODE).

We want create a place to track the progression of an oncology diagnosis over and above the problem list/claims observational welter of the CONDITION_OCCURRENCE table. This welter can be linked to via the EPISODE_EVENT table.

We have added the International Classification of Diseases for Oncology (ICDO) standard for the base representation of an oncology diagnosis: site/histology or topography/morphology. See here: https://codes.iarc.fr/.

An ICDO site/histology pairing (pre-coordinated if possible to SNOMED) should be placed in the EPISODE.episode_object_concept_id. The EPISODE.episode_concept_id field should contain the state of the ‘Disease Episode’. For example, ‘Disease First Occurrence’, ‘Disease Recurrence’, ‘Disease Progression’. Linking between ‘Disease Episode’ entries can be accomplished via parent/child relationships between Episodes via the EPISODE.episode_parent_id column.

We have added the NAACCR tumor registry data dictionary as the initial standard vocabulary of modifying tumor characteristics, e.g., staging, grade, lymphatic invasion, tumor size. See here: http://datadictionary.naaccr.org/default.aspx?c=10. We are looking at the possibility of also adding the CAP cancer checklists as another vocabulary of modifying tumor characteristics. See here: Cancer Protocol Templates | College of American Pathologists

Looking at your example. We have been concentrated on pathology-focused tracking of tumors. I think you are doing imaging-focused tracking of tumors. I don’t think NAACCR or CAP covers some of the attributes you list. So you could help us enrich our extension by better supporting imaging-based tumor findings.

Taking your example with my annotation in bold:

  • lesion_site (brain) EPISODE.episode_object_concept_id you. You are “missing” histology/morphology. Do you have that? For example anaplastic astroctytoma or glioblastoma? Maybe not if this is pre-surgical monitoring.
  • lesion_location (cerebellar) EPISODE.episode_object_concept_id
  • lesion_orientation (left) MEASUREMENT pointing to EPISODE
  • lesion_detection_method (x-ray) MEASUREMENT pointing to EPISODE
  • lesion_size MEASUREMENT pointing to EPISODE
    • lesion_size_x MEASUREMENT pointing to EPISODE
    • lesion_size_y MEASUREMENT pointing to EPISODE
  • lesion_recist (bool) ?
  • lesion_recist_responce (stable diseases) EPISODE.episode_concept_id with a concept of ‘Stable Diseases’ and EPISODE.episode_parent_id pointing to prior state of the "lesion"
  • lesion_recist_size MEASUREMENT pointing to EPISODE
    • lesion_recist_size_x MEASUREMENT pointing to EPISODE
    • lesion_recist_size_y MEASUREMENT pointing to EPISODE
  • lesion_recist_detection_method MEASUREMENT pointing to EPISODE

Thank you so much! It will help us a lot!
So as I understand the extension right:
0. I create CONDITION_OCCURENCE record (eg Melanoma). It will be linked with EPISODE record by person_id

  1. I create record Lesion A in EPISODE table (eg lesion of brain - standard concept)
  2. Then I put all lesion features in MEASUREMENT (size, detection method, lab results) and link to Lesion A in EPISODE_EVENT table
    3.Then I create Lesion A progression in table EPISODE and mark Lesion A record as episode_parent_id
  3. Then the same process with progression features
    Also I can create PROCEDURE_OCCURENCE record (eg radiation) and link it to Lesion A in EPISODE_EVENT. All surgery details (energy, units, methods) I can put in MEASUREMENT and link to Radiation in FACT_RELASHIONSHIP.
    Am I right?

Some additional questions:
We will have new DOMAIN EPISODE across all vocabularies?
It’s better to use NAACCR vocabulary instead of eg SNOMED for oncology?

Thank you!

Hello everyone,

I’m trying to map data concerning prostate cancer and I have the following question, hoping I could clear things up:
I have data about prostate lesions, which can be more than one in the same anatomic site(both in prostate but one is in MRPZpl and the other in MLTZa location), and each of them has their own characteristics: location, diameter, volume, gleason score etc.
Is “lesion of prostate” something that would be mapped as an episode, having one episode per lesion (with the respective measurements) so I know which characteristic refers to what lesion episode or should lesion of prostate be a condition (having as many condition entries as the number of lesions) and then their respective measurements?
Conceptually wise nothing seems correct to me since in the first option lesions seem to be two different episodes in time which is not correct, and if I have one I don’t have a way of distinguishing the characteristics of each, and in the latter scenario we have two conditions, which is not correct either (its the same condition).

Thank you very much in advance for any help or clarification.


You should capture both lesions as separate Conditions (picking the best histology/topography combination from the descendants of Primary malignant neoplasm of prostate, I am not familiar with those abbrevations), and link the modifiers to them accordingly. If there are no concepts at the granularity level you need then pick the same concept for both.

At the episode level both of them would be one disease episode.

1 Like