OHDSI Home | Forums | Wiki | Github

Note_nlp table discussion

I have a question for all users of ‘note’ and ‘note_nlp’ tables. In addition to bringing it up in upcoming NLP WG meetings, we would like to survey the community on their use of the note_nlp table and the use cases how extracted terms are used in cohort definitions that utilize NLPed data.

I am distinguishing a scenarioA where relevant event tables (measurement or condition_occurrence) are populated with information extracted during note NLP process (under type concept that indicates that source of such rows in the NLP process over notes); E.g., note tells me ejection fraction is 45% so this info will go to measurement table)

from scenarioB where NLPed data stay in note_nlp table (perhaps because it would confuse other users of CDM that do not carefully check all possible type concepts when they use measurement or condition_occurrence tables) (e.g., note tells me ejection fraction is 45% but this info will stay in note_nlp and will NOT become a row in measurment table)

We are considering things like Achilles characterization of columns note_nlp.section_concept_id and note_nlp.lexical_variant and note_nlp.note_nlp_concept_id.

We are also looking at how an OHDSI toolset could/should support working with data stored in note_nlp. Please reply to this thread if you have data in those tables and would be interested to discuss it more.

(a proposal to extend Achilles precomputations for nlp_note table by me is here define analyses relevant for data in note_nlp table · Issue #764 · OHDSI/Achilles · GitHub )

We also want to consider situations where due to PHI concerns, a user may only have access to nlp_note table (processing done by 3rd party) and have no access to actual notes (note table would have zero rows or contain only some columns (not have actual text data in note.note_text). (and have community conventions around how to handle this context)

Related old thread is (one out of many)

Again, please reply to this thread or message me here or email me at vojtech.huser @ odysseusinc dot com if you are populating note or/and note_nlp table.

The University of Colorado populates the Note table and Note NLP. At this time, the latter is only used for the N3C project. I don’t know how the Note NLP table is used by the N3C project. At the University, we are not using it to construct cohorts.

1 Like

I (and my team) are interested in this topic as it relates to “pre-NLP’ed” data received from data vendors. In a perfect world we could bring this data into our CDM and query it like any other data source, but of course we need to take into account all the limitations and additional metadata that exist for this data. Unfortunately even though it’s possible to link events back to their source NOTE_NLP rows, the structure of the NOTE_NLP data doesn’t accommodate the level of analysis we’d like to do in order to make NLP data useful in a safe & reliable way.

We’ve kicked around some ideas for CDM changes/extensions but haven’t spent a ton of time on this yet. Open to discuss it as we’d only want to pursue a widely-accepted change that could ultimately be supported by the broader OHDSI ecosystem.

1 Like

At the VA we have so far only utilized NLP data in the actual CDM using the scenario A you’ve described here (Also specifically for LVEF).

@mgurley shared the NLP table modification proposal during the NLP WG meeting on 2024-06-12 (join WG for access to today’s NLP WG recording), but passing along to this thread as well in case anyone reading is interested:

NOTE NLP Proposal POC · OHDSI/NLPTools Wiki (github.com)