OHDSI Home | Forums | Wiki | Github

Shiny App for OMOP Query Library


#7

Hi Peter, this sounds like an interesting project, I’d like to help.


(Michael Wichers) #8

Hello, I would enjoy to get involved with this too. Please include me on this. Thank you!


(Martijn Schuemie) #9

Making the app is easy. I propose creating a new Shiny app that borrows heavily from the SqlRender app. The app can live in a subfolder of the OMOP-Queries repo, from where it can be deployed to the OHDSI Shiny server.

The hard part will be to transform the OMOP-Queries repo into something machine-readable. Currently, the repo contains a list of file folders with Microsoft Word documents with uninformative names (e.g. General\G01.docx). My proposal for reformatting this is:

  1. Change the name of the repo to adhere to developer guidelines, e.g. ‘OmopQueries’. While we’re at it, maybe give it a more meaningful name, like ‘StandardCdmQueries’, or ‘SharedCdmQueries’?

  2. Reorganize the folder structure to match domain names. So instead of having the ‘condition’, ‘condition era’, and ‘condition occurrence’ folders, just have a ‘condition’ folder. Also, I prefer to get rid of spaces, because this might cause trouble on some platforms, so ‘observation_period’ instead of ‘Observation Period’.

  3. Choose some standard machine-readable representation for the content. Ideal for a machine would be json, although markdown has the advantage of being machine-readable and will render nicely in Github as well. If we go with markdown, we’ll simply have to specify standard sections that all files will need to adhere to, like ‘Title’, ‘Description’, ‘SQL’, ‘Input’, ‘Output’, ‘Sample output record’. Instead of having a single file per domain as already done here, I would have one index file per domain, and separate files per query.

  4. Represent the content in our format of choice.

  5. Clean up the queries a bit. For example, the first query here uses the BETWEEN operator which is not recommended since it is interpreted differently across SQL dialects, and the SQL doesn’t adhere to the code style guide for SQL.


(Peter Rijnbeek) #10

Thanks all for the great feedback so far. Nice to see there is interest for a tool like this.

I had a chat with @rtmill today and he showed me what he has made so far. In the last week he added a first version of a ShinyApp on top of this Query database and it looked already very nice and includes SqlRender functionality.

We discussed that it would be nice @schuemie could make a repository for him so he can share what he has made so far. Martijn he will contact you about this.

What I think would be good is to have a quick call with the people in this thread that are interested in thinking about / working on this idea. Robert can then show what he has done so far and the others can give their input. Robert is dedicated to work on this already but it would of course be nice if others could help out in prototyping this.

Let me know which time works for you:

https://doodle.com/poll/sbg6mptuw78gprss


(Peter Rijnbeek) #11

@gregk could you fill in the Doodle to fix a call? I like to fix the slot. Thanks


(Gregory Klebanov) #12

Ah, sorry Peter - missed your Doodle post… Done now


(Peter Rijnbeek) #13

Dear all,

We will discuss this Monday 13th 18:00 CET, 12:00 EST. I have send @gregk, @MaximMoinat, @rtmill, @Andrew, @schuemie an agenda item. Robert will demonstrate what he made so far and we can brainstorm together how to move this forward.

@MichaelWichers, @chris.wallace i do not have your email to send you an invite so if you can send it in private message I will. The gotomeeting details are as follows for anyone who likes to join:

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/240576989

You can also dial in using your phone.
United States: +1 (571) 317-3129

Access Code: 240-576-989

More phone numbers
Australia: +61 2 8355 1040
Austria: +43 7 2088 0034
Belgium: +32 28 93 7018
Canada: +1 (647) 497-9391
Denmark: +45 69 91 88 64
Finland: +358 923 17 0568
France: +33 170 950 592
Germany: +49 692 5736 7210
Ireland: +353 14 845 976
Italy: +39 0 247 92 12 39
Netherlands: +31 208 080 379
New Zealand: +64 9 909 7888
Norway: +47 21 03 58 96
Spain: +34 911 82 9782
Sweden: +46 853 527 836
Switzerland: +41 225 3314 51
United Kingdom: +44 20 3535 0621

First GoToMeeting? Let’s do a quick system check:https://link.gotomeeting.com/system-check


(Peter Rijnbeek) #14

Thanks All for joining the TC yesterday.

It was good to hear that all of you see the value of a Shiny App containing the Query Library. Thanks @rtmill for sharing the nice work you already did in building a Shiny App and thanks @MaximMoinat for sharing the work you already did in standardising the OMOP queries in Markdown and setting up a folder structure for this.

During our call it became clear that we have to decide on a couple of things:

  1. What is the scope of this tool? I could hear different opinions (correct me if I am wrong). I think @rtmill and @Andrew like to make this broader than only “OMOP queries+”. They like to add for example the Achilles queries, future DQA, CDM DDL, Themis Adherence tests, and even ETLs etc (see also the post fo @rtmill above). @schuemie, @gregk, @MaximMoinat and @Rijnbeek see the Query App as used for the queries per domain as on the old OMOP website but then supported by SQLRender functionality and search functionality for the queries. It may well be there are good reasons to implement both approach because they have different aim i think.

  2. We had a discussion on whether we should put the queries in a database structure of stick to txt files in Github that are read and parsed by the app. I agree with @schuemie that the Github approach has advantages: we could leverage the version control functionality and could have someone take responsibility of the queries contributed by the community, e.g. through pull requests, validation steps etc. It is crucial that the queries in the library are thoroughly tested. Using markdown and fixed format we have readable format we can parse. @MaximMoinat already showed some work he recently did on this which i think looked promising and I hope you will push further on this and pull into the OHDSI Github.
    @rtmill explained that he sees the query library more as like the vocabulary. You can download a version in a data schema. I can also see the advantages of this approach, but a this moment i would prefer a standalone tool that does not require a database and could for example easily be installed on a Shiny Server, but of course open to further discussion.

  3. We discussed if we would like the tool to be able to run against the CDM itself as @rtmill implemented already in his app. I think we agreed that this would be a nice option but that would require a way to present the results to the user. I could imagine that if you install the package in R you either return the results in the R console, write to file, or show the results in a window in the App. This is of course not possible if we make a public Shiny app not connected to database.

We decided that we would continue this discussion on the Forum with this post. If I missed anything in the summary above please add or correct.

Finally, we wondered who is the current owner of https://github.com/OHDSI/OMOP-Queries it is not completely clear who will deal with pull requests, e.g. from Maxim and others that like to contribute. @MichaelWichers are you that person?

Look forward to further discussion on this topic.

Peter


(Robert Miller) #15

@Rijnbeek Sorry for the delay.

Perhaps it would be best to elaborate a bit on what I’ve been thinking to see if we can marry the ideas.

Issues to be addressed:
A ) SQL queries for OHDSI scattered among formats, repos, SQL flavors
B ) It’s difficult for those without programming background to implement and contribute to OHDSI
C ) Community developed code is rarely shared causing an unneccesary replication of work

While moving the ‘OMOP-Queries’ repo into a standard format and making it easily accessible would be a step in the right direction, it only touches on issue A and ignores the other two. If we develop a standard format and structure for parameterized SQL, an intuitive way for community members to share their own queries, and a local tool to run the queries, then the group of people who can contribute to and leverage OHDSI would be much less exclusive.

For this to work I think there are two distinct pieces:

  • a centralized library of SQL queries with a webapp on top
  • a local tool to ingest and run the libraries against your local dbs

I agree it makes sense to focus on the former piece as we have been. Having a validated and protected set of official OHDSI queries seems definite, but can we also incorporate the sharing of user generated queries into the same tool? Perhaps this sandbox of user queries routinely gets mined and added to the ‘standard’ queries? Not sure.

This all boils down to scope. If the focus of this effort is to produce a webapp to be able to copy and paste the ‘OMOP-Queries’ then surely what I am proposing is overkill.


(Maxim Moinat) #16

Thanks @rtmill and @Rijnbeek. We should definitely decide on the scope and also the target user.

For the initial scope, as mentioned in the call, I suggest a central app with a simple query library of independently runnable sample queries. The target users are data scientists new to OHDSI; i.e. they have experience with sql, but lack OHDSI/OMOP specific knowledge. They will use the app to learn to work with their newly setup OMOP CDM, which they can easily access through their favourite database tool.

I agree that an app that supports query execution against your own local database would be very useful. However, this will require a local installation of the app, which makes the tool less accessible.

@schuemie Would it be possible to make the Shiny App run against the ohdsi.org database? Or would this give all kinds of security issues?


(Taha Abdul-Basser) #17

Excellent summary. Achilles developers watch with anticipation :slight_smile:


(Andrew Williams) #18

I want to draw out a point covered in your excellent summary, @Rijnbeek and in @rtmill 's reply. Beyond making it easier to run code that already exists - this may present an opportunity to:

  1. Promote more SQL code contributions by community members
  2. Make the validity of contributed code transparent

To achieve 1 there would need to be an easy way to upload SQL in the appropriate form with the appropriate metadata about purpose, etc.
To achieve 2 there would need to be an easy way to upload metadata on the validty of the code. That might include attestations by people that have run it in different environments, automated code validation software output, etc.

A solution to those goals could strike an attractive the balance between ease of code sharing and confidence that code does what it purports to do. Eventually that balance might make sense to bring to other sets of OHDSI SQL scripts, e.g. for Achillies etc., For now, I agree that it makes good sense to start with this use case.

I am agnostic about whether that functionality should be provided via files that only live on GitHub or are ingested by a local tool. I draw out the above points to encourage us to keep those goals in mind in defining the approach to take because I think they are potentially of wide-ranging importance.


(Mustafa) #19

Sorry I’m late to the game! If this reply is better-suited to its own post, please let me know and I’m happy to post elsewhere. It’s a different approach to what seems like a similar problem.

About @rtmill’s points, and a ‘vocabulary’ of queries as mentioned in @Rijnbeek’s update: In addition to the proposed interfaces to query a CDM, the OMOP CDM might be a good candidate for a domain-specific language. There’s a data model around which to work, with the focus of the model being a particular unit (‘person’), and several queries could form a domain-specific-vocabulary that users could then compose into more complex queries.

As an example, I put together some code (happy to share, premature as it may be) to achieve the following (using dbplyr and an OMOP CDM on PostgreSQL, but it could be suited to the parameterized SQL used in SqlRender):

everyone %who_took% aspirin_concept_ids
anyone %first_took% aspirin_concept_ids
selected_pop <- everyone %who_had% diabetes_concept_ids
selected_pop %first_took% aspirin_concept_ids

To explain, the ‘anyone’/‘everyone’ arguments on the left-hand side (lhs) are not evaluated and instead interpreted as a ‘WHERE TRUE’ clause. If the user provides a left-hand side dataframe containing a ‘person_id’ column, then the statement is instead interpreted as a ‘WHERE person_id IN {lhs person_ids}’ clause. Each of %who_took% and %who_had% return the DRUG_EXPOSURE and CONDITION_OCCURRENCE tables, respectively, but the person_id-based table filtering means that these statements may be composed.

Additionally, I could imagine a ‘+’ operator that would do something like bind_rows if both the lhs and rhs are of the class cdm_person, but performs a left-join by person_id if the lhs is of class cdm_person and the right-hand side is a different cdm class (e.g. cdm_drug_exposure). Each table would have its own class.

Unfortunately, increasing user-friendliness by abstracting away SQL may not be so helpful as an educational tool, which is definitely a drawback as far as @MaximMoinat’s point about helping people learn the CDM. At the same time, maybe it could be a good gateway (especially if users inspect the underlying SQL)?

Again, a DSL is perhaps too high-level, and is not a Shiny App, so forgive if this is off-topic. Going back to @rtmill’s post, such wide scope could very well be overkill. However, it strikes me as a great chance to leverage R and metaprogramming because the ‘standard’ queries are in a great position to be functioned away (with well-defined inputs/outputs and SqlRender–thanks @schuemie!).

Have a great weekend!

P.S.
Each of who_took and who_had are implemented as infix functions with the % on either side, but also as functions to be called using parenthetical notation (e.g. who_had(...)). The who_had function has an argument first_only that is used to create first_had, this is also used for the drug exposure function who_took to implement first_took. Alternatively, the first_took function can be recapitulated as first_time(who_took(...), 'drug_exposure'), which is possible because the first_time function works on a table regardless of where it is implemented. ‘drug_exposure’ or ‘condition’ is then pasted into a string that is used for the query to extract the earliest occurrence of e.g. drug_exposure_start_datetime, grouped by person_id and whatever extra arguments are provided as part of ... (e.g. drug_concept_id).

The concept_id objects are made more user-friendly by sanitizing the input, allowing functions to accept a e.g. a vector of concept ids, data.frame of concept ids, and associated user-defined concept id names (if provided as a named list, dataframe, etc).

Under the hood, there is a set of evaluators. Right now it uses S3, ideally, these would use multiple dispatch (with the S4 system). Or maybe it would use the more standard-OO-style R6 system, still working that out. I’ve been working mainly from T. Mailund’s recent work on DSLs in R


(Peter Rijnbeek) #20

Hi Andrew,

I like the idea of have the community contribute code. We do need to have a process to validate that code first.
Uploading directly in an app i think may not be the preferred solution for this (without that process). Adding metadata on the validity is an interesting concept we could indeed think about, but i suggest to start simple first and have people add new scripts through an issue type approach which gets validated (how to decide) before it is added in the app.

Peter


(Peter Rijnbeek) #21

Thanks @MaximMoinat,

I agree with your approach of starting simple and scale up if driven by use cases. The first use case of having the frequently used SQL queries in an App is a good start. I suggest we lay down the functionality for that using the markdown approach and extend from there.

Execution against your database is a nice to have thing I agree and we can make that an option in the App so you can deploy it with (locally) and without (centrally).

I would personally not link up the central one with a database.

Peter


(Peter Rijnbeek) #22

@mustafaascha,

Thanks for your input.

I had a discussion about DSL earlier with some people. Personally, I think the best way to train people on how to use the CDM and the Vocabularies is to use SQL as this is the most expressive way. The SQL library will make the queries accessible for more people (you can replace the drug of interest etc) and moreover we have tools like Atlas that produces general results for users without SQL knowledge.

However, I am not a DSL expert nor user, so it may be valuable for others. I do think for the tool we are talking about now it is bit off an overkill indeed.

Peter


(Maxim Moinat) #23

Hi All,

Hope you enjoyed the OHDSI symposium. Unfortunately, I could not be there due to family circumstances.

I did make some time to create a first version of the Shiny app. See screenshot below and the following repo: https://github.com/thehyve/omop-queries/tree/shinyApp.

To test, clone the ‘shinyApp’ branch and open an R console in the main folder. Then execute:

library(shiny)
runApp('omopQueryApp')

It should open a Shiny window with the G01 OMOP Query pre-selected. If shiny is not installed yet on your system, execute install.packages('shiny') and install.packages('shinydashboard') first.

All that the app now does, is rendering the markdowns in the OMOP-Query repo and extracting the sql part for translation. For that, I have blatantly copied the SqlDeveloper shiny app by Martijn.

Any feedback is welcome! Please do note that this is a first version. Especially the file selection box has to be replaced with a filterable table, with metadata from the markdown files (like domain and table target).


(Isaiah Nyabuto) #24

Hello Peter,

This is a nice introductory to the community! Am joining now and I can feel the energy…and that the community is also vibrant with shinyApps too. I would love to contribute on this. Just let me know where things stand and what needs to be done:-)

cheers, Isaiah


(Peter Rijnbeek) #25

Hi Maxim,

Thanks for moving this forward. Just returned from US, I will have a look at this on Monday and will give my feedback. Would be great to get the input from the others as well.

Peter


(Peter Rijnbeek) #26

Hi Isaiah,

Welcome to the community. Can you send me a private message to explain a bit what your expertise is etc and why/how you like to contribute to this specific topic.

Thanks,

Peter


t