Offset means the index of the extracted part ? Then why not replace it with “begin”, “end” that are represents begining index and end index of the extracted part? Tools such UIMA should works on such information. offset is not sufficient.
lexical_variant field: it is only 250 character long.
What about Notes Sections ? Are they supposed to be stored in the note_nlp table or in the note table ? FHIR share the Notes as Composition resource. And it stores only sections(https://www.hl7.org/fhir/valueset-doc-section-codes.html). But splitting notes into section is a NLP task, that is why I ask the question
5.the note_type_concept_id contains the standard coding for note types (CDO). But I can’t find the related note_type_concept_source_id that would represent the local type coding. Am I missing something ?
Well, we need both: We need the Concepts, and the Concepts need to have the types.
Regarding source concepts: We don’t provide those in the Standardized Vocabulary unless there exists a standard coding scheme like ICD10 or so. Hence we also didn’t set up a field for this.
offset is a sql term. This leads to difficulties while writing the Sql statement problem
since it is a varchar field, it is not easy and standardized to describe the offset. “1;4” or “1-4” or “1to4”. Two integer fields offset_begin and offset_end would be more standardized, more optimized, and more sql compliant than that “offset” varchar field
a note_nlp_parent_id field looks useful to me. It allows nlp subtasks that would allow refer to previously extracted part of text. For eg, if the extracted row is a lexical_variant containing section “conclusion”, then other rows, could refer to that row, and extract information from it, instead of directly the note table.
I tried to perform DQ test provided by PEDSnet for OMOP CDM v5.3 in Postgresql and got to know that there are too many discrepancies present in the project.
Few important ones-
It includes visit_detail table which is in OMOP CDM V6.
Query cannot be performed on Note_NLP schema due to “offset” column present, which prevents the query to run being a keyword.
There are some other ones, posted these details here so it can be redirected to consult person.