OHDSI Home | Forums | Wiki | Github

NLP Workgroup Discussion Thread

(Vojtech Huser) #1

Today the NLP workgroup met.
There was email discussion calling for a schema to capture NLP results.

I would like to initiate a discussion on such schema. The proposal is at the wiki page below:


Please edit the wiki page to add your input to this schema or use the forum to discuss it.

Welcome to OHDSI! - Please introduce yourself
(Vojtech Huser) #2

Link to an earlier schema discussion as at this link

The presentation suggest to support multiple types of NLP detected entities.

These would be:

Anatomical Site

Perhaps we can pick one of the domains (e.g., Disease/Disorder) and further discuss the columns for this domain. Will we need multiple note_nlp_xxxxx tables then?

(James Wiggins) #3


I’m pursuing some ideas utilizing the OMOP CDM ‘NOTE’ and ‘NOTE_NLP’ tables, but I don’t have any sample data. Does anyone know of any publicly available sample data sets for these tables?



(Hua Xu) #4


You can use MTSamples - http://mtsamples.com/, which is a collection of fake notes. We have annotated some MTSamples notes and can share with you too. Thanks.


(Selva) #5

Is this group active? Do you have WG calls?

(Hua Xu) #6

yes, check out the WG info here: https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:nlp-wg

(Tarun Shah) #7


I would also like to contribute and learn. How can I join NLP work group?
if you can, please add me - tmshah@ismnet.com

Tarun Shah

(Selva) #8

Hello, I would like to be part of this WG as well. Email-id is selvasathappan36@gmail.com

(Alexander Sivura) #9

Hi Hua. May I ask to share annotated samples with me too? I joint to OHDSI community few days ago and I’m thinking to join NLP working group. My email is asivura@icloud.com

(Selva) #10


I am interested to learn NLP. Am a beginner. If there are any tasks that I can volunteer, request you to let me know. Do you have any opportunities in your project?

(Hua Xu) #11

We have the dataset at OHDSI NLP GitHub: https://github.com/OHDSI/NLPTools.

Here is a link to CLAMP outputs, which is similar to the annotated data: https://github.com/OHDSI/NLPTools/tree/master/clamp-wrapper/output_xmi

(Hua Xu) #12

Sure. Welcome! Why don’t you join our monthly meeting and see what is going on here. Then you can decide which project that you can contribute. The call in information can be found here: https://www.ohdsi.org/web/wiki/doku.php?id=projects:workgroups:nlp-wg (at the bottom of the page). Thanks.


(Selva) #13

Sure HuaXu, Thanks. will join for sep 11th call without fail

(Alexander Sivura) #14

Thanks for sharing. But it’s an output of CLAMP pipeline, isn’t it? It would be great to have some data manually checked to use it as ground truth.

I also have a question about tooling to label data manually. Could you recommend anything?

(Kate Weber) #15

I’ve learned a lot listening in on the working group calls but have a fundamental question:

I cannot, for the life of me, work out how to get the CLAMP wrapper configured and running.

I’m labeling a messy problem history data set… I think I would like to use Usagi to identify consistent concepts associated with my headers but will want to use CLAMP to pick out concepts associated with the free text in the subsequent “explain” parts.

My tentative development plan was going to be to have human annotators mark up a gold-standard set of our problem lists in CLAMP and then use its machine learning tooling to refine the stock OHDSI pipeline. Will this approach work?

I’m also very interested in practical ways to use the NLP objects you’ve designed. My first instinct was going to be when I discovered (for example) a specific medication, that I would write a row to the DRUG_EXPOSURE table. But am I jumping too far from the intent of the NLP work?

These may be remedial questions - but I’d be grateful for a little help coming up to speed with ways to operationalize this tooling.

UM School of Dentistry

(Álvaro) #16

Hi, is this group still active, having calls etc? I’d really like to contribute, since I’m doing extensive use of NLP with OMOP CDM and have several possible improvements I’d love to discuss.