OHDSI Home | Forums | Wiki | Github

ETL: Semi-structured clinical notes to Note table

(Gil Frenkel) #1

Hi every one!

I have clinical notes (admission report, surgeon report, anesthesiologist report, etc) in a semi-structured format, specifically XML files, written in Hebrew.

Some of the fields are:

  • Well defined - such as a description of a specific procedure, along with the closest procedure code possible, for example in a surgeon note: "<CPT4>60254</CPT4>

  • Not well defined - such as a complete free text describing the events occurred during the operation, for example in a surgeon note:

    "<PROCESS>The patient is lying on his side, … injecting 0.35mg of … "</PROCESS>

And a lot of other fields, but of course the last text if very important to analyze and I want to use NLP algorithms on it.

I known that:

  • The note_title is the head root (field/tag name) - such as “SURGEON_REPORT” above.

  • note_class_concept_id - can be either 706599 or 42527098.

  • language_concept_id - 4180047

I have several questions, and I would love to get some ideas/clarifications

My questions are:

  1. How do I populate the note_text field with the XML content above?
    Is note_text the verbatim content of the file? I.e. the string:

  2. Does every record in the NOTE table represents a unique document/report/note? So in my case each record represents the whole XML file?


(Christian Reich) #2



Makes sense for longer texts. For free text snippets the most effective way is still manual mapping.

Good luck.

(Gil Frenkel) #3

Thank you @Christian_Reich!