OHDSI Home | Forums | Wiki | Github

Note_nlp questions

(Nicolas Paris) #1

Hi there

I have several questions about the new note_nlp table

  1. offset field is described as an integer in the google document (https://docs.google.com/document/d/1ykYVJTQ5MuI7eh_Nk7xzt44EzNjVs71nq2LIsC_RlOg/edit#) However, is is described and implemented as a character(250) every where else. It makes sense to be an “integer”.
  2. Offset means the index of the extracted part ? Then why not replace it with “begin”, “end” that are represents begining index and end index of the extracted part? Tools such UIMA should works on such information. offset is not sufficient.
  3. lexical_variant field: it is only 250 character long.
  4. What about Notes Sections ? Are they supposed to be stored in the note_nlp table or in the note table ? FHIR share the Notes as Composition resource. And it stores only sections(https://www.hl7.org/fhir/valueset-doc-section-codes.html). But splitting notes into section is a NLP task, that is why I ask the question


(Nicolas Paris) #2

Hi there

I add an other incomprehension point:

5.the note_type_concept_id contains the standard coding for note types (CDO). But I can’t find the related note_type_concept_source_id that would represent the local type coding. Am I missing something ?


(Christian Reich) #3


You are right. The concepts exist, but we need to re-assign them the right domain. Next release.

Do you have an active NLP going?

(Nicolas Paris) #4

Hi @Christian_Reich
Thanks for the answer

You are right. The concepts exist, but we need to re-assign them the right domain. Next release.

Not sure to understand. I meant there is a column missing, not a concept. Can you clarify so that I am right, and I understand why ?

Do you have an active NLP going?

Definitely. Right now I suspect I need to extend the structure of both note & note_nlp tables a bit.

(Christian Reich) #5


Well, we need both: We need the Concepts, and the Concepts need to have the types.

Regarding source concepts: We don’t provide those in the Standardized Vocabulary unless there exists a standard coding scheme like ICD10 or so. Hence we also didn’t set up a field for this.

(Nicolas Paris) #6

Here other remarks about the table:

  • offset column: problem-

      1. offset is a sql term. This leads to difficulties while writing the Sql statement problem
      1. since it is a varchar field, it is not easy and standardized to describe the offset. “1;4” or “1-4” or “1to4”. Two integer fields offset_begin and offset_end would be more standardized, more optimized, and more sql compliant than that “offset” varchar field
  • a note_nlp_parent_id field looks useful to me. It allows nlp subtasks that would allow refer to previously extracted part of text. For eg, if the extracted row is a lexical_variant containing section “conclusion”, then other rows, could refer to that row, and extract information from it, instead of directly the note table.

(Nicolas Paris) #7

the note nlp should have columns:

  • person_id
  • visit_occurrence_id
  • visit_detail_id

This is the design of the CDM. Making joins again and again on the note table is problematic.

BTW, could I make a proposal on all those aspect, and if yes, how ?


(Christian Reich) #8


Please come to the CDM WG. Put your name in the member list, get invited, create proposals for change and have it done.

(Ambuj) #9

I tried to perform DQ test provided by PEDSnet for OMOP CDM v5.3 in Postgresql and got to know that there are too many discrepancies present in the project.
Few important ones-
It includes visit_detail table which is in OMOP CDM V6.
Query cannot be performed on Note_NLP schema due to “offset” column present, which prevents the query to run being a keyword.
There are some other ones, posted these details here so it can be redirected to consult person.

Thank you