OHDSI Home | Forums | Wiki | Github

NCI thesaurus refresh

Hey there! I downloaded a vocabulary from athena for only NCI thesaurus, and the result was only about 2500 concepts. But I downloaded the thesaurus directly from the NCI and it contains about 166k concepts. How do I go about requesting that the NCI thesaurus be updated in OMOP? Thanks!


Currently, OMOP NCIt vocabulary intends to hold the terms only pertaining to cancer staging and to a small extent, protein variants. If you need all the NCIt terms to be present in OMOP vocabulary, normally you would need to provide use cases.

Very long time before you get a response but I have one! The NCI now provides a clinical trials API that allows you to provide an NCI trial identifier (NCI-YYYY-NNNNNN format) and receive back a JSON file that contains heaps of metadata about that trial including inclusion and exclusion criteria, and they are working on doing NLP and other things to get concepts coded. The problem is they are using NCI thesaurus codes.

I am working on a stand up to cancer supplement grant where we are trying to match patients to clinical trials. Identifying potential patients from our OMOP instance is challenging but doable. But really well-coded information about research protocols basically doesn’t exist. Most people scrape clinicaltrials.gov or something equivalent. In order to map between OMOP concept codes that describe eligibility of patients and match them to eligibility concepts about trials, I need NCT thesaurus in athena. Even more important would be NCIt to snomed concept mappings :slight_smile:

So does this meet the need for a use case?

1 Like

Hello @blm14 , thank you for bringing this up. I think this can be discussed in the Vocabulary work group with @mik .

@blm14 We at Northwestern are working on using EHR data to match patients to cancer clinical trials. I would love to hear more about the clinical trials API that exposes coded data. Is it publicly available?

Regarding better support for mapping of cancer concepts in OMOP, the Oncology WG Vocabulary Subgroup meets every other Thursday at 1:00PM EST. And we meet this Thursday. If you bring a list of example concepts we can help push the mappings with the vocabulary team. I am sure they would align with our larger goals.

It does, @blm14, and I am happy you state the use case. The next problem is: resources. As @janice said: good input for the Vocab WG. The next one meets on the 25th of October.

Do you have an idea of the usage of the NCIt? I am sure not every concept out of the 166k is used.

Hi @blm14 - I agree with @Christian_Reich, this does sound like a valid use case to me. So, essentially you want to build a tool that consumes a clinical trial definition, extracts inclusion and exclusion criteria (in NCIt codes) and apply those as a filter (or phenotype) to figure out if you have matches in your OMOP CDM?
If you can now find some more allies like @mgurley who support your case, then create a github issue for tracking it and come to the Vocabulary WG (you might want to sign up to the Common Data Model teams channel) with a convincing statement and some backup from your new allies :grinning:.

The NCI requires that all centers that receive the CCSG P30 submit clinical trial data through a portal called CTRP. The API is public and you can sign up and request an API key here: CTS Developer Accounts

I warn you though, the call that you use to obtain the trial data delivers it as a GZIPPED JSON file binary stream :slight_smile:

1 Like

You could probably get a list of trial identifiers and see what codes appear most. But the NCI thesaurus also has its own lineage (ancestors and descendants etc) so it would probably behoove us to have the whole thing

that is 100% what I am trying to do. Take metadata from clinical trial JSONs from the NCI, map eligibility from NCIt to something more useful (LOINC codes, rxnorm codes, snomed codes?) so that we can do case identification from OMOP ehr data