OHDSI Home | Forums | Wiki | Github

Shiny App for OMOP Query Library

Excellent summary. Achilles developers watch with anticipation :slight_smile:

I want to draw out a point covered in your excellent summary, @Rijnbeek and in @rtmill 's reply. Beyond making it easier to run code that already exists - this may present an opportunity to:

  1. Promote more SQL code contributions by community members
  2. Make the validity of contributed code transparent

To achieve 1 there would need to be an easy way to upload SQL in the appropriate form with the appropriate metadata about purpose, etc.
To achieve 2 there would need to be an easy way to upload metadata on the validty of the code. That might include attestations by people that have run it in different environments, automated code validation software output, etc.

A solution to those goals could strike an attractive the balance between ease of code sharing and confidence that code does what it purports to do. Eventually that balance might make sense to bring to other sets of OHDSI SQL scripts, e.g. for Achillies etc., For now, I agree that it makes good sense to start with this use case.

I am agnostic about whether that functionality should be provided via files that only live on GitHub or are ingested by a local tool. I draw out the above points to encourage us to keep those goals in mind in defining the approach to take because I think they are potentially of wide-ranging importance.

Sorry I’m late to the game! If this reply is better-suited to its own post, please let me know and I’m happy to post elsewhere. It’s a different approach to what seems like a similar problem.

About @rtmill’s points, and a ‘vocabulary’ of queries as mentioned in @Rijnbeek’s update: In addition to the proposed interfaces to query a CDM, the OMOP CDM might be a good candidate for a domain-specific language. There’s a data model around which to work, with the focus of the model being a particular unit (‘person’), and several queries could form a domain-specific-vocabulary that users could then compose into more complex queries.

As an example, I put together some code (happy to share, premature as it may be) to achieve the following (using dbplyr and an OMOP CDM on PostgreSQL, but it could be suited to the parameterized SQL used in SqlRender):

everyone %who_took% aspirin_concept_ids
anyone %first_took% aspirin_concept_ids
selected_pop <- everyone %who_had% diabetes_concept_ids
selected_pop %first_took% aspirin_concept_ids

To explain, the ‘anyone’/‘everyone’ arguments on the left-hand side (lhs) are not evaluated and instead interpreted as a ‘WHERE TRUE’ clause. If the user provides a left-hand side dataframe containing a ‘person_id’ column, then the statement is instead interpreted as a ‘WHERE person_id IN {lhs person_ids}’ clause. Each of %who_took% and %who_had% return the DRUG_EXPOSURE and CONDITION_OCCURRENCE tables, respectively, but the person_id-based table filtering means that these statements may be composed.

Additionally, I could imagine a ‘+’ operator that would do something like bind_rows if both the lhs and rhs are of the class cdm_person, but performs a left-join by person_id if the lhs is of class cdm_person and the right-hand side is a different cdm class (e.g. cdm_drug_exposure). Each table would have its own class.

Unfortunately, increasing user-friendliness by abstracting away SQL may not be so helpful as an educational tool, which is definitely a drawback as far as @MaximMoinat’s point about helping people learn the CDM. At the same time, maybe it could be a good gateway (especially if users inspect the underlying SQL)?

Again, a DSL is perhaps too high-level, and is not a Shiny App, so forgive if this is off-topic. Going back to @rtmill’s post, such wide scope could very well be overkill. However, it strikes me as a great chance to leverage R and metaprogramming because the ‘standard’ queries are in a great position to be functioned away (with well-defined inputs/outputs and SqlRender–thanks @schuemie!).

Have a great weekend!

P.S.
Each of who_took and who_had are implemented as infix functions with the % on either side, but also as functions to be called using parenthetical notation (e.g. who_had(...)). The who_had function has an argument first_only that is used to create first_had, this is also used for the drug exposure function who_took to implement first_took. Alternatively, the first_took function can be recapitulated as first_time(who_took(...), 'drug_exposure'), which is possible because the first_time function works on a table regardless of where it is implemented. ‘drug_exposure’ or ‘condition’ is then pasted into a string that is used for the query to extract the earliest occurrence of e.g. drug_exposure_start_datetime, grouped by person_id and whatever extra arguments are provided as part of ... (e.g. drug_concept_id).

The concept_id objects are made more user-friendly by sanitizing the input, allowing functions to accept a e.g. a vector of concept ids, data.frame of concept ids, and associated user-defined concept id names (if provided as a named list, dataframe, etc).

Under the hood, there is a set of evaluators. Right now it uses S3, ideally, these would use multiple dispatch (with the S4 system). Or maybe it would use the more standard-OO-style R6 system, still working that out. I’ve been working mainly from T. Mailund’s recent work on DSLs in R

Hi Andrew,

I like the idea of have the community contribute code. We do need to have a process to validate that code first.
Uploading directly in an app i think may not be the preferred solution for this (without that process). Adding metadata on the validity is an interesting concept we could indeed think about, but i suggest to start simple first and have people add new scripts through an issue type approach which gets validated (how to decide) before it is added in the app.

Peter

Thanks @MaximMoinat,

I agree with your approach of starting simple and scale up if driven by use cases. The first use case of having the frequently used SQL queries in an App is a good start. I suggest we lay down the functionality for that using the markdown approach and extend from there.

Execution against your database is a nice to have thing I agree and we can make that an option in the App so you can deploy it with (locally) and without (centrally).

I would personally not link up the central one with a database.

Peter

@mustafaascha,

Thanks for your input.

I had a discussion about DSL earlier with some people. Personally, I think the best way to train people on how to use the CDM and the Vocabularies is to use SQL as this is the most expressive way. The SQL library will make the queries accessible for more people (you can replace the drug of interest etc) and moreover we have tools like Atlas that produces general results for users without SQL knowledge.

However, I am not a DSL expert nor user, so it may be valuable for others. I do think for the tool we are talking about now it is bit off an overkill indeed.

Peter

1 Like

Hi All,

Hope you enjoyed the OHDSI symposium. Unfortunately, I could not be there due to family circumstances.

I did make some time to create a first version of the Shiny app. See screenshot below and the following repo: GitHub - thehyve/OMOP-Queries at shinyApp.

To test, clone the ‘shinyApp’ branch and open an R console in the main folder. Then execute:

library(shiny)
runApp('omopQueryApp')

It should open a Shiny window with the G01 OMOP Query pre-selected. If shiny is not installed yet on your system, execute install.packages('shiny') and install.packages('shinydashboard') first.

All that the app now does, is rendering the markdowns in the OMOP-Query repo and extracting the sql part for translation. For that, I have blatantly copied the SqlDeveloper shiny app by Martijn.

Any feedback is welcome! Please do note that this is a first version. Especially the file selection box has to be replaced with a filterable table, with metadata from the markdown files (like domain and table target).

1 Like

Hello Peter,

This is a nice introductory to the community! Am joining now and I can feel the energy…and that the community is also vibrant with shinyApps too. I would love to contribute on this. Just let me know where things stand and what needs to be done:-)

cheers, Isaiah

Hi Maxim,

Thanks for moving this forward. Just returned from US, I will have a look at this on Monday and will give my feedback. Would be great to get the input from the others as well.

Peter

Hi Isaiah,

Welcome to the community. Can you send me a private message to explain a bit what your expertise is etc and why/how you like to contribute to this specific topic.

Thanks,

Peter

Hi All,

Worked a bit on this today starting from your version @MaximMoinat.

https://mi-erasmusmc.shinyapps.io/QueryLibrary/

What i did:

  1. Added a table that is automatically filled by all the markdown query files. Type = subfolder name. It allows for filtering and search. The description is taken from the markdown filename (replace _ by space, and remove extension).
  2. Automatically render a HTML version of the query you select in the table using the SqlRender that translates the sql in the markdown to the selected dialect in the configuration tab (pretty cool). This is displayed on the right. This allows for auto colors in the sql.
  3. Added a configuration tab with schemas, server details etc. if @cdm or @vocab is in the markdown sql this is replaced on the fly. I do not think we need more parameters in the sql (like done automatically in SqlDeveloper App).
  4. Execute tab added, but functionality still needs to be added. Ideally i like the query to be editable so you can change the input parameters before your run. The table below will show the results of the query.

To be discussed:

  1. We need to standardize the markdown files better, some have missing content, different order of elements etc.
  2. It may be nice to allow for a userfolder that you can specify in the configuration. All files in this folder will then also be automatically added in the table, i.e. you can add to the standard library content if you want on top of the validated once.

Any other thoughts?

Screen shot:

I noticed that Chrome on my machine has some .js issues (not looked into this yet if this is a local issue), the app works fine for me in Safari.

Hey @Rijnbeek

  • Would it be possible create new queries, modify existing ones?
  • Add parameterized queries - that can be rendered in sql-render
  • Would be nice to have a API like functionality - that gives fully rendered sql back to another app. e.g. another app may want to report the number of people by gender in a cohort, and age distribution of the cohort.

This looks great! (although for some reason I get a lot of ‘disconnected from server’. May need to debug some more on Chrome).

For me and others like me that need to do things that are not supported by existing tools I think this is just right: I can quickly find queries for everyday tasks, and perhaps modify them as needed. My end goal would be to create some custom SQL based these example that may end up in one of my own study packages.

I think others like @Gowtham_Rao have very different use cases that I’m still struggling to understand. Here are my 2 cents:

Query sharing: If you have queries that are generic (like the ones already in this library) that you think are useful for a wide audience in a wide variety of settings, then I would say these should be added to this library. However, if the queries are specific to one study, they should just be in their own study package.

API and parameterized versions of the queries: I’m not sure I understand the use case here. To my point above: if you’re actually looking for an easy way to implement studies that you can distribute to data partners, I suggest creating a study package.

Thanks Martijn.

I will have a look at the Chrome issue, not sure why that happens.

Yes, I agree. The purpose of the library is to have access to often used queries for re-use and also for training purposes, e.g. in a training we can refer to query G04. If there are often encountered questions on how to perform a certain query on the CDM we can add these to the library through Github issue/pull request. We can add a validation procedure by the package maintainer.

Parameterized SQL is already supported in this version but I have limited this to @cdm and @vocab in the queries that are rendered on the fly using the settings in de configuration tab (see for an example the visit query). It can be made flexible to add a query dependent parameter box but for now I think this is a bit of an overkill.

If you run it locally we could add the option to define a user folder that adds your own queries on top of the once provided by OHDSI potentially, but not first priority now.

I am not so sure about making add/edit query options in the APP itself available (definitely not on the community version that would mean we need to add user accounts etc which is a bridge to far i think).

API i also do not understand @Gowtham_Rao.

That looks awesome, Peter.

In general, this is already a useful resource as is. More functionality is not strictly necessary. Two small things we can improve: 1) add more metadata to the queries, so they can be found more easily. For example the tables being queried. 2) An additional cleanup of the sql files. For instance ‘sysdate’ is not translated properly (in OMOP SQL it is GETDATE()).

This is great @Rijnbeek.
I wish this existed (or i knew about it) when i first started learning SQL and writing queries for the CDM.

One addition that would be useful would be a filter to select queries for a specific type (e.g., to see all the queries for drug exposures) without having to scroll through the whole list. Especially if this list begins to expand or the types become more granular.

Hi @Dave_Kern thanks.

That functionality is already implemented you can click on the filter box above the column and select.

Peter

HI all,

A quick update. I have created a new Github repo https://github.com/OHDSI/QueryLibrary
that contains an R package you can install and then you can run a shinyApp. I had to move this to a new repo because the OMOP-Queries name is not following the best practices (no dash in the package name) and I think QueryLibrary is a better name.

I have recorded a short video that demonstrates the app following @anthonysena good example when we were working on other stuff (sorry for the somewhat low resolution but you get the idea)

Would be nice to get some feedback from all of you in this thread and especially @ericaVoss, @clairblacketer, @Patrick_Ryan.

To be clear this is work in progress.

1 Like

Hi Peter, how far is your team with adding the other sql scripts? In other words, for which queries do you still need help?

Last week when we spoke in Brussels, you mentioned a tool to check the rendered queries against multiple databases (also used for SqlRender). It would be nice to build an automated query validator. Where can we find this tool?

t