ETL: Semi-structured clinical notes to Note table

gil.frenkel · October 22, 2020, 1:06pm

Hi every one!

I have clinical notes (admission report, surgeon report, anesthesiologist report, etc) in a semi-structured format, specifically XML files, written in Hebrew.

Some of the fields are:

Well defined - such as a description of a specific procedure, along with the closest procedure code possible, for example in a surgeon note: "<CPT4>60254</CPT4>
Not well defined - such as a complete free text describing the events occurred during the operation, for example in a surgeon note:

"<PROCESS>The patient is lying on his side, … injecting 0.35mg of … "</PROCESS>

And a lot of other fields, but of course the last text if very important to analyze and I want to use NLP algorithms on it.

I known that:

The note_title is the head root (field/tag name) - such as “SURGEON_REPORT” above.
note_class_concept_id - can be either 706599 or 42527098.
language_concept_id - 4180047

I have several questions, and I would love to get some ideas/clarifications

My questions are:

How do I populate the note_text field with the XML content above?
Is note_text the verbatim content of the file? I.e. the string:
“<SURGEON_REPORT>…</SURGONE_REPORT>”
Does every record in the NOTE table represents a unique document/report/note? So in my case each record represents the whole XML file?

Thanks!

Christian_Reich · October 25, 2020, 1:28pm

Yes.

Makes sense for longer texts. For free text snippets the most effective way is still manual mapping.

Good luck.

gil.frenkel · October 25, 2020, 2:51pm

Thank you @Christian_Reich!