OHDSI Home | Forums | Wiki | Github

How to store Pathological examination result in CDM

Hello All,
I was confused which table should a Pathological examination record stored?PROCEDURE_OCCURRENCE or MEASUREMENT?
And how should the examination result be stored?

The PROCEDURE_OCCURRENCE refers to activities on the patient, the Pathological examination mostly through analysis of tissue, cell, and body fluid samples. So it seems not belong to PROCEDURE_OCCURRENCE.
If the Pathological examination result belongs to table MEASUREMENT, there is no numerical or categorical result from the Pathological examination can be stored in, and since there is no value_as_string field, how should we store the text results?

Many thanks in advance for your help

@qiongwang:

I think you are spot on. It’s a record in MEASUREMENT, since there is nothing happening to the patient, and there is a result. You need to pick a Concept for a result. Do you need help in finding the concepts?

And there can be multiple rows in MEASUREMENT table
Whether the cancer is invasive
Grade
mitotic rate
Tumor margin
etc. So you need to pick the correct concept for each of them and its categorical values.

If you have free text you can’t split to categories, you can create MEASUREMENT table entry - Pathological examination.
and fill the NOTE table with the description connecting it to the MEASUREMENT using note_event_id, note_event_field_concept_id

I believe that pathology procedures belong in the Procedure domain. And the OMOP vocabulary mostly agrees with me. For example, if you look in the OMOP vocabulary, SNOMED code 39228008 ‘Surgical pathology procedure’ is in the ‘Procedure’ domain.

http://athena.ohdsi.org/search-terms/terms/4213297

I have worked with pathology data within OMOP. This is how I believe that pathology data should be represented in OMOP:

  • The pathology procedure belongs in the ‘PROCEDURE_OCCURRENCE’ table.
  • The pathology report that records the pathology findings of the pathology procedure belongs in the ‘NOTE’ table.
  • The pathology report in the ‘NOTE’ table should be related to the pathology procedure in the ‘PROCEDURE_OCCURRENCE’ table via the note_event_id/note_event_field_concept_id (or via FACT_RELATIONSHIP in CDM 5.X).
  • The pathology findings (like anatomic site, histology, grade, staging and lymphatic invasion, etc.) belong in the ‘MEASUREMENT’ domain/table.
  • The pathology findings in the ‘MEASUREMENT’ domain/table should related to the pathology procedure in the ‘PROCEDURE_OCCURRENCE’ table via ‘FACT_RELATIONSHIP’.

I have an open ticket related to this here:

The upshot of my ticket is that I believe pathology procedure concepts should be uniformly moved into the ‘Procedure’ domain and that the definition of a procedure should be modified as follows:

“The PROCEDURE_OCCURRENCE table contains records of activities or processes ordered by, or carried out by, a healthcare provider on the patient, OR A SPECIMEN EXTRACTED FROM THE PATIENT, to have a diagnostic or therapeutic purpose. Procedures are present in various data sources in different forms with varying levels of standardization.”

One current issue is that pathology procedure concepts are scattered accross multiple domains:

SNOMED Code SNOMED URL OHDSI URL Name Domain
108259003 http://browser.ihtsdotools.org/?perspective=full&conceptId1=108259003&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4032248 Autopsy pathology procedure AND/OR service Procedure
127801007 http://browser.ihtsdotools.org/?perspective=full&conceptId1=127801007&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4133843 Body fluid analysis Measurement
26086007 http://browser.ihtsdotools.org/?perspective=full&conceptId1=26086007&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4094377 Bone marrow laboratory procedure (procedure) Measurement
73735000 http://browser.ihtsdotools.org/?perspective=full&conceptId1=73735000&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4249882 Cytogenetic procedure Procedure
77485009 http://browser.ihtsdotools.org/?perspective=full&conceptId1=77485009&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4300190 Cytopathology procedure or service Procedure
15719007 http://browser.ihtsdotools.org/?perspective=full&conceptId1=15719007&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4048586 Fine needle aspirate with routine interpretation and report Measurement
64444005 http://browser.ihtsdotools.org/?perspective=full&conceptId1=64444005&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4276031 Flow cytometry Procedure
426329006 http://browser.ihtsdotools.org/?perspective=full&conceptId1=426329006&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4142271 Fluorescence in situ hybridization Procedure
33468001 http://browser.ihtsdotools.org/?perspective=full&conceptId1=33468001&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4141733 Hematology procedure Measurement
4192879 http://browser.ihtsdotools.org/?perspective=full&conceptId1=394916005&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4192879 Hematopathology Observation
108262000 http://browser.ihtsdotools.org/?perspective=full&conceptId1=108262000&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4032250 Molecular biology method Procedure
116148004 http://browser.ihtsdotools.org/?perspective=full&conceptId1=116148004&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4019097 Molecular genetics procedure Procedure
59000001 http://browser.ihtsdotools.org/?perspective=full&conceptId1=59000001&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4244107 Surgical pathology consultation and report on referred slides prepared elsewhere Observation
39228008 http://browser.ihtsdotools.org/?perspective=full&conceptId1=39228008&edition=us-edition&release=v20180901&server=https://prod-browser-exten.ihtsdotools.org/api/snomed&langRefset=900000000000509007 http://athena.ohdsi.org/search-terms/terms/4213297 Surgical pathology procedure Procedure

@mgurley
I tend to agree with you.
No doubts that let’s say Ultrasonography is a procedure. Techically both Ultrasonography and Pathology are diagnostic procedures. Main work in Ultrasonography is to analyze computer images you’ve got (so the part of procedure is done not on a patient directly) :slight_smile:

So we need to modify Procedure domain defition as you said.
and it totally fits in Measurement definition:
The MEASUREMENT table contains records of Measurement, i.e. structured values (numerical or categorical).
So there’s no place here for Ultrasound or Pathology report.

Friends:

Let me put out some slightly divergent views:

The OMOP Vocabulary doesn’t have a mind of itself, it is what we tell it what the domains are. :slight_smile: And right now these examples would be mistakes. Procedures have to be on the patient for a diagnostic or therapeutic purpose. But I could be convinced to extend that definition to “on a patient or specimen derived off of a patient”.

Sounds good.

Urgh. These cross-links are ugly and lack use cases. Do you have any?

@karthik @HuaXu @jon_duke @mgurley @Dymshyts and @Christian_Reich, we need to communicate how ETLs to the oncology extension should handle some of the more definitive sources of information about conditions - e.g. stage - in the MEASUREMENT table/domain. People doing these ETLs will use NLP and will need guidance re what concepts belong in note_NLP_concept_id in the NOTES_NLP table. Does the original rational for having Condition concepts in NOTES_NLP extend to these concepts? If so, should they go into both MEASUREMENT and NOTES_NLP? If so, are these concepts for conditions and measurement domains in NOTES_NLP used differently or completely redundant?

@Christian_Reich
The use case for connecting NLP-derived pathology findings to the pathology procedure is that people will want to see the textual evidence that was the basis for an NLP-derived/chart abstracted data point. Most real world, historical pathology findings data are stuck in clinical text. For oncology, this will be very important.

But maybe a better approach would be

  • The pathology procedure belongs in the ‘PROCEDURE_OCCURRENCE’ table.
  • The pathology report that records the pathology findings of the pathology procedure belongs in the ‘NOTE’ table.
  • The pathology report in the ‘NOTE’ table should be related to the pathology procedure in the ‘PROCEDURE_OCCURRENCE’ table via the note_event_id/note_event_field_concept_id (or via FACT_RELATIONSHIP in CDM 5.X).
  • The pathology findings (like anatomic site, histology, grade, staging and lymphatic invasion, etc.) belong in the ‘MEASUREMENT’ domain/table.
  • The pathology findings in the ‘MEASUREMENT’ domain/table should be related to the pathology report via the addition of two new fields to the NOTE_NLP table:
    note_nlp_event_id
    note_nlp_event_field_concept_id

These two new fields would replace ‘note_nlp_concept_id’. ‘note_nlp_concept_id’ is already deficient in that it can only represent non-EAV structures like ‘CONDITION_OCCURRENCE’ or ‘PROCEDURE_OCCURRENCE’. Not ‘MEASUREMENT’ or ‘OBSERVATION’

@Andrew the undocumented convention of NOTE_NLP only representing the ‘Condition’ domain seems very limiting.

There already is a type concept of ‘NLP Derived’, so the ‘measurement_type_concept_id’ could be ‘NLP Derived’. All NLP-related metadata could continue to live within the NOTE_NLP table. But all the analytical and UI tools could begin using NLP data today. Instead of it being stranded in the NOTE_NLP table. Folks that don’t trust NLP could filter out any ‘NLP Derived’ data using the type concept.

Friends:

Let’s ping the NLP folks here: @HuaXu, @jon_duke, @nigam, @noemie. Can you be so kind and comment on

  • Whether or not the NOTE_NLP table must only contain Conditions
  • What you do with EAV type situation, where you have a MEASUREMENT_CONCEPT_ID and a VALUE_AS_CONCEPT_ID combination

Absolutely true. Will start on that.

Here is my understanding of the original intent and I think the current design (although I welcome corrections because I have been away a bit).

note_nlp_concept_id can be any domain, on purpose.

note_nlp_value was in the original proposal, but it was decided to stick that information in term_modifiers instead.

note_nlp is basically a temporary table to be used while we learn what we really need. There are a very large number of potential fields, and we did not want to create a gigantic table, 90% of which never got used. term_modifiers takes anything not encoded in the other columns.

If you trust the output of your NLP, then put the note in the note table, the primary parse in the note_nlp table, and a streamlined copy of the information in the correct domain table.

We could adopt a convention about how to store value in term_modifiers while we collect other mandatory fields and then add the new columns in a future iteration.

What you do with EAV type situation, where you have a MEASUREMENT_CONCEPT_ID and a VALUE_AS_CONCEPT_ID combination

Here’s how we handle it. We have an NLP pipeline that extracts TNM staging data from surgical pathology reports. So it’ll show up in the note as “Lorem ipsum dolor sit amet T1N0MX consectetur adipiscing elit. Nulla rutrum facilisis…” and our pipeline will extract a row for the report with a “Tstage” “Nstage” and “Mstage” column - in this case, the values of those would be 1, 0, and X.

In instances where there’s a standard concept for the NLP-extracted value, we use that as the note_nlp_concept_id - so in the above, the note_nlp_concept_id is 40481057 (SNOMED for “pT1a category.”)

In instances where we need the EAV structure because the value doesn’t exist as a CONCEPT_ID, we use the lexical_variant column to store the reslts. So take PHQ-9 (a screening instrument for depression) for example. We set the note_nlp_concept_id as 3042932 (LOINC for “Patient Health Questionnaire 9 item (PHQ-9) total score [Reported]”) and then put the actual score (e.g. 13, 17, 21) in lexical_variant. This is probably not ideal but it’s the only way we could think of to allow for the EAVish structure you’re describing here.

  • Whether or not the NOTE_NLP table must only contain Conditions

As you can doubtless see from the above, we would argue that it shouldn’t only contain conditions, because sometimes what you’re extracting is a measurement. There are plenty of other good use cases for representing NLP-derived data that maps to other domains (NLP-extracted ejection fraction data from echocardiography reports, etc). Ultimately, I’d vote for what @mgurley describes above as the best way to move forward:

There already is a type concept of ‘NLP Derived’, so the ‘measurement_type_concept_id’ could be ‘NLP Derived’. All NLP-related metadata could continue to live within the NOTE_NLP table. But all the analytical and UI tools could begin using NLP data today. Instead of it being stranded in the NOTE_NLP table. Folks that don’t trust NLP could filter out any ‘NLP Derived’ data using the type concept.

@Christian_Reich Yes. But how? Share you a list of ‘item name’ of our pathological examination ? Only can be in Chinese.

Looks good, I will try. But for the last point, there’s no chinese NLP tools I can use to deal with my free text data. What have you done to transfer your free text data to concepts data?

@qiongwang

A point of clarification. This following has not yet been implemented in the OMOP CDM:

The pathology findings in the ‘MEASUREMENT’ domain/table should be related to the pathology report via the addition of two new fields to the NOTE_NLP table:
note_nlp_event_id
note_nlp_event_field_concept_id

But I will be bringing this idea to the NLP working group to see if there is interest in backing this idea as a proposal.

For NLP tools that can deal with free text data have you looked at the Standford NLP for ‘Chinese Natural Language Processing and Speech Processing’? I have used Stanford NLP for English text. See here:

Also it looks like there is work to add Chinese languange support to spaCy:

You should attend the NLP working group and see what recommendations you get from folks there:

https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:nlp-wg#upcoming_meeting_dates

Regarding:

Share you a list of ‘item name’ of our pathological examination ? Only can be in Chinese.

That is OK. Share a small subset of your highest value concept names and concept values in Chinese. I am sure we can find somebody in the community that can begin helping with the mapping process. Out of curiosity, are the pathology findings you are attempting to extract cancer-related?

t