Hi every one!
I have clinical notes (admission report, surgeon report, anesthesiologist report, etc) in a semi-structured format, specifically XML files, written in Hebrew.
Some of the fields are:
Well defined - such as a description of a specific procedure, along with the closest procedure code possible, for example in a surgeon note: "<CPT4>60254</CPT4>
Not well defined - such as a complete free text describing the events occurred during the operation, for example in a surgeon note:
"<PROCESS>The patient is lying on his side, … injecting 0.35mg of … "</PROCESS>
And a lot of other fields, but of course the last text if very important to analyze and I want to use NLP algorithms on it.
I known that:
The note_title is the head root (field/tag name) - such as “SURGEON_REPORT” above.
language_concept_id - 4180047
I have several questions, and I would love to get some ideas/clarifications
My questions are:
How do I populate the note_text field with the XML content above?
Is note_text the verbatim content of the file? I.e. the string:
Does every record in the NOTE table represents a unique document/report/note? So in my case each record represents the whole XML file?