OHDSI Home | Forums | Wiki | Github

Building a python package to batch map using OMOP dictionary

We want to build a python package to do batch translations from one vocabulary to another using OMOP vocabulary resources. This open-source tool could help those who want to federate databases.

The package requires standardized vocabulary files downloaded from the OHDSI website:
https://www.ohdsi.org/analytic-tools/athena-standardized-vocabularies/

We wondered about the most suitable option to access and use the dictionaries.

  • Is there an API to select and download desired vocabularies?
  • Are the links generated and provided after requesting dictionaries persistent?
  • In case the two previous options are not available, would you mind if we host the dictionary with all dictionaries on PhysioNet or GitHub?

Thank you,

Xavi

Hi Xavi,

I have had the need to write Python packages to actually do that and I think it will be a great initiative. I will answer some of your questions and tag @mik for the rest.

Athena (athena.ohdsi.org) provides a link. You could definitely use any convenient package to download the zip file. However, there is no functionality at the moment to select which vocabularies to download or authenticate to the Athena server with an API key. As far as I know selecting the vocabs to download is still a manual process

The links are not persistent and expire after a certain time. Not sure what is the time limit

This heavily depends on which vocabs are you going to store. If they are vocabs that do no require a license and the resource is clearly linked to OMOP I believe it is fair game according to a response I saw from @Christian_Reich. For vocabs that require a license, there is a process to actually verify that the requestor owns the license. For UMLS there is an automatic verification already implemented by passing the credentials or API ket to a jar file that comes with the zip download from Athena. If you would want to use those you may need to replicate such processes.

my 2 cents of unrequested opinion :wink:

I think it will be better to create a package that assumes the user gets the URL and build the rest of the tools around that

Hopefully, this is useful to you

Thank you so much, Jose, for this comprehensive answer! As you suggested, we will start prompting the user for the URL.

@xborrat and friends:

Couple of points:

  1. We need an API. Obviously. Yes, you could write a web scraper to imitate a user using Athena, but really, we should just make that available programmatically. Of course the question is who writes that thing? Got resources?

  2. The choice of vocabularies is manual, but that’s ok: Athena remembers your choice once it is made, and each time you press the download button will pick the vocabularies you chose. So, that would not be a concern.

  3. The proprietary vocabularies are part of that choice linked to your account. You’d have to produce evidence of having a license, and then the proprietary ones will be part of your package.

  4. Hosting the vocabularies: Not possible. Reason is that you need a redistribution license for all vocabularies that are not fully Open Source. OHDSI has a long list of such redistribution licenses, and some were very hard to get. So, no, sorry, no hosting outside Athena.

1 Like

Hi @Christian_Reich,

Thank you for the response. I was not sure about the extent of the licenses we got. Thankfully I tagged you ( :relieved:)

It is good to know that the API is a good idea in need of an owner

Thank you, @Christian_Reich and @jposada, for your valuable insights. We will go through using the URL as @jposada suggested.

Thanks,
xavi

@jposada and crew, I am relatively new to OHDSI (participated is a few CDM projects) but with an introduction to someone who understands the scope of what the api would involve I’d volunteer to assist with documentation of requirements, etc. and then implementation. I will be at the symposium tomorrow if you could refer me to anybody that would like to meet there to tell me that’s crazy.

t