OHDSI Home | Forums | Wiki | Github

DrugBank

On the Laertes project we are trying to bring in the Vigibase data into the evidence base. Vigibase using WHO-Drugs and there are licensing concerns to work around. However, there seems to be a way that we could use DrugBank to get around the issue. There are many other projects that would also benefit from having DrugBank drug ids loaded into the Standard Vocab. This has been discussed before and I think it was decided to move forward but then was put aside because of other priorities. Here I am officially requesting that the DrugBank drug IDs be loaded and also suggesting how.

My lab has worked with DrugBank a bunch on our Linked SPLs project (https://dbmi-icode-01.dbmi.pitt.edu/linkedSPLs/). As a result, we have a high quality mapping of DrugBank drug ids to RxNorm that was created using the FDA’s UNII system, INCHI keys, and some hueristics. A mapping created with DrugBank 4.1 (current DrugBank is 4.3) is available in the LAERTS github repository:

The process can be more or less automated to run as DrugBank provides updated XML files. I can provide those details to folks to help get this process going.

best,
-R

Rich:

You got it. We’ll put it in. We’ll probably have a few questions when we start.

@rkboyce

Actually, here is the first one: Looks like you only published the final mapping tables, not the scripts. True? If so, they will invevitable get stale. Usually, we publish in Github all the sripts that make the mapping tables, so they can be re-run. Sometimes there are manual steps in them (add new mappings between certain things). Thoughts?

I posted the link to the old google code repository with the mapping
scripts. I will ask my programmer to filter that down to the essential
components and post them on Github. Would you like them under
https://github.com/OHDSI/Vocabulary-v5.0 ?

@rkboyce:

Got it. We need to figure out a way how we incorporate that into the vocab building process. How? Reverse engineering is not popular. :smile:

Where is your process documented. It will probably be quicker for me to
learn your process and port our scripts than to have your team reverse
engineer.

@Christian_Reich - Well, I felt bad for about a second when I did not get a reply and thought to myself “maybe Christian does not want to answer because it is clear on the Vocab GitHub site how the vocab is generated…” So, I checked out the link to the process on the Github page and was greeted with “You’ve followed a link to a topic that doesn’t exist yet.” :smile:

So, scanning a few sub-folders in the Vocab repository tells me that your preferred approach is heavily SQL driven and requires that a new vocab source be loadable into a DBMS and then mapped through SQL commands. Our current process for creating the RxNorm to drugbank is not hard (see below).
I could image that, to get into the Vocab, the same process could be used with an additional step that would load the file output in step 4 into the Vocab and then use the RxNorm codes in the table to map to DrugBank IDs using OMOP concept source codes:

  1. Get the DrugBank update - it is provided as a large XML file: http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip

  2. Get the FDA Substance Registration System UNII table update - it is provided as a CSV file: http://fdasis.nlm.nih.gov/srs/jsp/srs/uniiListDownload.jsp

  3. Using the following Python script, we load the UNII data and Drugbank data and then create the mapping using INCHI keys and exact string matches. The script can be ran from the command line with an option to create mappings using drug name synonymns, or not: http://swat-4-med-safety.googlecode.com/svn/trunk/linkedSPLs/ChEBI-DrugBank-bio2rdf-mapping/scripts/parseDBIdBySynsInchiName.py

  4. Some simple linux commandline parsing produces the final CSV data file e.g., see http://swat-4-med-safety.googlecode.com/svn/trunk/linkedSPLs/ChEBI-DrugBank-bio2rdf-mapping/UNII-data/README

There you have it. What are your thoughts about how we should proceed?

thanks,
-R

@rkboyce:

Yikes. The page moved. But where? :slight_smile: I’ll get back to you.

Christian was just travelling around like nuts. But this page actually is pretty clean and does explain how to do it.

Actually, let us take a look. You are right, the current process is heavily SQL-based. Reason is that we use a ton of standard scripts to do the housekeeping: proper life cycle (deprecation, upgrades), mapping rules, referential integrity of various kinds. If we had to redo this for every vocab we would drown. So, I think we should take a look at what you are doing and then talk about options.

t