Hi every one!
I have clinical notes (admission report, surgeon report, anesthesiologist report, etc) in a semi-structured format, specifically XML files, written in Hebrew.
Some of the fields are:
-
Well defined - such as a description of a specific procedure, along with the closest procedure code possible, for example in a surgeon note: "<CPT4>60254</CPT4>
-
Not well defined - such as a complete free text describing the events occurred during the operation, for example in a surgeon note:
"<PROCESS>The patient is lying on his side, … injecting 0.35mg of … "</PROCESS>
And a lot of other fields, but of course the last text if very important to analyze and I want to use NLP algorithms on it.
I known that:
-
The note_title is the head root (field/tag name) - such as “SURGEON_REPORT” above.
-
language_concept_id - 4180047
I have several questions, and I would love to get some ideas/clarifications
My questions are:
-
How do I populate the note_text field with the XML content above?
Is note_text the verbatim content of the file? I.e. the string:
“<SURGEON_REPORT>…</SURGONE_REPORT>” -
Does every record in the NOTE table represents a unique document/report/note? So in my case each record represents the whole XML file?
Thanks!