OHDSI Home | Forums | Wiki | Github

NLP Workgroup Discussion Thread

(Vojtech Huser) #1

Today the NLP workgroup met.
There was email discussion calling for a schema to capture NLP results.

I would like to initiate a discussion on such schema. The proposal is at the wiki page below:


Please edit the wiki page to add your input to this schema or use the forum to discuss it.

Welcome to OHDSI! - Please introduce yourself
(Vojtech Huser) #2

Link to an earlier schema discussion as at this link

The presentation suggest to support multiple types of NLP detected entities.

These would be:

Anatomical Site

Perhaps we can pick one of the domains (e.g., Disease/Disorder) and further discuss the columns for this domain. Will we need multiple note_nlp_xxxxx tables then?

(James Wiggins) #3


I’m pursuing some ideas utilizing the OMOP CDM ‘NOTE’ and ‘NOTE_NLP’ tables, but I don’t have any sample data. Does anyone know of any publicly available sample data sets for these tables?



(Hua Xu) #4


You can use MTSamples - http://mtsamples.com/, which is a collection of fake notes. We have annotated some MTSamples notes and can share with you too. Thanks.


(Selva Muthu Kumaran Sathappan) #5

Is this group active? Do you have WG calls?

(Hua Xu) #6

yes, check out the WG info here: https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:nlp-wg

(Tarun Shah) #7


I would also like to contribute and learn. How can I join NLP work group?
if you can, please add me - tmshah@ismnet.com

Tarun Shah

(Selva Muthu Kumaran Sathappan) #8

Hello, I would like to be part of this WG as well. Email-id is selvasathappan36@gmail.com

(Alexander Sivura) #9

Hi Hua. May I ask to share annotated samples with me too? I joint to OHDSI community few days ago and I’m thinking to join NLP working group. My email is asivura@icloud.com

(Selva Muthu Kumaran Sathappan) #10


I am interested to learn NLP. Am a beginner. If there are any tasks that I can volunteer, request you to let me know. Do you have any opportunities in your project?

(Hua Xu) #11

We have the dataset at OHDSI NLP GitHub: https://github.com/OHDSI/NLPTools.

Here is a link to CLAMP outputs, which is similar to the annotated data: https://github.com/OHDSI/NLPTools/tree/master/clamp-wrapper/output_xmi

(Hua Xu) #12

Sure. Welcome! Why don’t you join our monthly meeting and see what is going on here. Then you can decide which project that you can contribute. The call in information can be found here: https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:nlp-wg (at the bottom of the page). Thanks.


(Selva Muthu Kumaran Sathappan) #13

Sure HuaXu, Thanks. will join for sep 11th call without fail

(Alexander Sivura) #14

Thanks for sharing. But it’s an output of CLAMP pipeline, isn’t it? It would be great to have some data manually checked to use it as ground truth.

I also have a question about tooling to label data manually. Could you recommend anything?