OHDSI Home | Forums | Wiki | Github

ETL: Semi-structured clinical notes to Note table

Hi every one!

I have clinical notes (admission report, surgeon report, anesthesiologist report, etc) in a semi-structured format, specifically XML files, written in Hebrew.

Some of the fields are:

  • Well defined - such as a description of a specific procedure, along with the closest procedure code possible, for example in a surgeon note: "<CPT4>60254</CPT4>

  • Not well defined - such as a complete free text describing the events occurred during the operation, for example in a surgeon note:

    "<PROCESS>The patient is lying on his side, … injecting 0.35mg of … "</PROCESS>

And a lot of other fields, but of course the last text if very important to analyze and I want to use NLP algorithms on it.

I known that:

  • The note_title is the head root (field/tag name) - such as “SURGEON_REPORT” above.

  • note_class_concept_id - can be either 706599 or 42527098.

  • language_concept_id - 4180047

I have several questions, and I would love to get some ideas/clarifications

My questions are:

  1. How do I populate the note_text field with the XML content above?
    Is note_text the verbatim content of the file? I.e. the string:
    “<SURGEON_REPORT>…</SURGONE_REPORT>”

  2. Does every record in the NOTE table represents a unique document/report/note? So in my case each record represents the whole XML file?

Thanks!

Yes.

Yes.

Makes sense for longer texts. For free text snippets the most effective way is still manual mapping.

Good luck.

Thank you @Christian_Reich!

t