OHDSI Home | Forums | Wiki | Github

Shiny App for OMOP Query Library

Hi All,

Since we will be doing a lot of training on the use of the CDM in the upcoming years and many more people across the world will be running/inventing queries I think an easy accessible and manageable CDM query library is of great value.

Thanks to @mvanzandt, Michael Wichers and others we now have all the old V4 queries from the OMOP times ( http://cdmqueries.omop.org/) in a V5 in Github: https://github.com/OHDSI/OMOP-Queries. This is a good start, but i think it is not yet very accessible for all our stakeholders.

What would be ideal I think is the following:

We have the nice ShinyApp that Martijn made to render SQL: http://data.ohdsi.org/SqlDeveloper/. This could be easily extended with a SQL library per domain. What would be nice is if we could standardise the queries in for example JSON, so we have all the extra information such ID, Description, Example, Field description, Output description in a format the R code could process. The app will automatically load all the json files. Also I suggest to parameterise the SQL so people could add the CDM schema in the App (Martijn already implemented that these automatically will generate an input box).

We could make a dropdown to select the domain and this creates a searchable list of queries (possibly with a hoover over effect to see the description text). If the user clicks on the query it is added in the SQL box and rendered according to the settings. Below the SQL boxes we could print the explanation of the Query.

The current app already allows you to copy paste or download the query.

I noticed in many meetings I have with potential and current users that building queries is not easy and requires a very deep understanding of the Vocabularies. Sometimes we build an advanced query upon their request but the rest of the community is not aware of it. I think a ShinyApp that runs on the web and is linked from ohdsi.org could help to avoid errors in query building. We could allow for request through the issue section of the tool. Some one needs to be responsible for the quality of the queries of course.

Anyone who likes to work on this? Other thoughts?

Peter

4 Likes

@Rijnbeek - great idea, Peter / Martijn. These can also be used as a set of test cases when releasing OMOP CDM databases or OHDSI tools.

Sign us up, please - Odysseus will contribute.

1 Like

Thanks Greg,

I think this is very easy to implement and will be very useful in EHDEN for training etc.

Peter

Hello Peter, that would be an awesome extension of the SqlRender app. I would be very willing to work on this.

This is great to see. I agree that standardizing and centralizing collections of OHDSI queries would be a huge benefit to the community.

I’ve been mulling a similar idea over the past couple months and started working on a proof of concept that pushes things a little further. If we have:

  1. Vocabularies of queries, formatted in SqlRender parameterized SQL, stored in a database
  2. A central tool, similar to Athena, to navigate, share, and download the query vocabularies (which are ingested into a local DB, similar to concept vocabularies)
  3. A local GUI tool (R, currently), leveraging SqlRender and DatabaseConnector, to navigate and run the queries

The potential applications are extensive, allowing the user to run any sort of database query (that doesn’t require dynamic user input or visualizations) with just a click.

Some examples :

  • Implementation (CDM DDL, constraints, indexing, vocabulary import)
  • DQA (achilles, custom/new tests, source data tests)
  • CDM versioning/updates
  • Metrics/Metadata generation
  • THEMIS adherence testing
  • Derived table population (e.g. _era tables)
  • Study protocols
  • ETLs
  • Query logs/schedules/metrics
  • …

The database structure needs to be thought through and expanded to incorporate more logic (e.g. dependencies, hierarchies, errors, logs) but right now it looks like this:

Thus far I’ve only been using it to run individual Achilles tests and stand up new OMOP databases.

@Rijnbeek Is this along the lines of what you were thinking? Any feedback on the approach is welcome.

Hi Peter,

It’s a great idea. May I ask what technologies going to use? Also, is major work provide a front-end interface to accommodates set of V4 cdm queries or actual work on extend sqlrender? Thank you!

Hi Peter, this sounds like an interesting project, I’d like to help.

Hello, I would enjoy to get involved with this too. Please include me on this. Thank you!

Making the app is easy. I propose creating a new Shiny app that borrows heavily from the SqlRender app. The app can live in a subfolder of the OMOP-Queries repo, from where it can be deployed to the OHDSI Shiny server.

The hard part will be to transform the OMOP-Queries repo into something machine-readable. Currently, the repo contains a list of file folders with Microsoft Word documents with uninformative names (e.g. General\G01.docx). My proposal for reformatting this is:

  1. Change the name of the repo to adhere to developer guidelines, e.g. ‘OmopQueries’. While we’re at it, maybe give it a more meaningful name, like ‘StandardCdmQueries’, or ‘SharedCdmQueries’?

  2. Reorganize the folder structure to match domain names. So instead of having the ‘condition’, ‘condition era’, and ‘condition occurrence’ folders, just have a ‘condition’ folder. Also, I prefer to get rid of spaces, because this might cause trouble on some platforms, so ‘observation_period’ instead of ‘Observation Period’.

  3. Choose some standard machine-readable representation for the content. Ideal for a machine would be json, although markdown has the advantage of being machine-readable and will render nicely in Github as well. If we go with markdown, we’ll simply have to specify standard sections that all files will need to adhere to, like ‘Title’, ‘Description’, ‘SQL’, ‘Input’, ‘Output’, ‘Sample output record’. Instead of having a single file per domain as already done here, I would have one index file per domain, and separate files per query.

  4. Represent the content in our format of choice.

  5. Clean up the queries a bit. For example, the first query here uses the BETWEEN operator which is not recommended since it is interpreted differently across SQL dialects, and the SQL doesn’t adhere to the code style guide for SQL.

2 Likes

Thanks all for the great feedback so far. Nice to see there is interest for a tool like this.

I had a chat with @rtmill today and he showed me what he has made so far. In the last week he added a first version of a ShinyApp on top of this Query database and it looked already very nice and includes SqlRender functionality.

We discussed that it would be nice @schuemie could make a repository for him so he can share what he has made so far. Martijn he will contact you about this.

What I think would be good is to have a quick call with the people in this thread that are interested in thinking about / working on this idea. Robert can then show what he has done so far and the others can give their input. Robert is dedicated to work on this already but it would of course be nice if others could help out in prototyping this.

Let me know which time works for you:

https://doodle.com/poll/sbg6mptuw78gprss

@gregk could you fill in the Doodle to fix a call? I like to fix the slot. Thanks

Ah, sorry Peter - missed your Doodle post… Done now

Dear all,

We will discuss this Monday 13th 18:00 CET, 12:00 EST. I have send @gregk, @MaximMoinat, @rtmill, @Andrew, @schuemie an agenda item. Robert will demonstrate what he made so far and we can brainstorm together how to move this forward.

@MichaelWichers, @chris.wallace i do not have your email to send you an invite so if you can send it in private message I will. The gotomeeting details are as follows for anyone who likes to join:

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/240576989

You can also dial in using your phone.
United States: +1 (571) 317-3129

Access Code: 240-576-989

More phone numbers
Australia: +61 2 8355 1040
Austria: +43 7 2088 0034
Belgium: +32 28 93 7018
Canada: +1 (647) 497-9391
Denmark: +45 69 91 88 64
Finland: +358 923 17 0568
France: +33 170 950 592
Germany: +49 692 5736 7210
Ireland: +353 14 845 976
Italy: +39 0 247 92 12 39
Netherlands: +31 208 080 379
New Zealand: +64 9 909 7888
Norway: +47 21 03 58 96
Spain: +34 911 82 9782
Sweden: +46 853 527 836
Switzerland: +41 225 3314 51
United Kingdom: +44 20 3535 0621

First GoToMeeting? Let’s do a quick system check:https://link.gotomeeting.com/system-check

Thanks All for joining the TC yesterday.

It was good to hear that all of you see the value of a Shiny App containing the Query Library. Thanks @rtmill for sharing the nice work you already did in building a Shiny App and thanks @MaximMoinat for sharing the work you already did in standardising the OMOP queries in Markdown and setting up a folder structure for this.

During our call it became clear that we have to decide on a couple of things:

  1. What is the scope of this tool? I could hear different opinions (correct me if I am wrong). I think @rtmill and @Andrew like to make this broader than only “OMOP queries+”. They like to add for example the Achilles queries, future DQA, CDM DDL, Themis Adherence tests, and even ETLs etc (see also the post fo @rtmill above). @schuemie, @gregk, @MaximMoinat and @Rijnbeek see the Query App as used for the queries per domain as on the old OMOP website but then supported by SQLRender functionality and search functionality for the queries. It may well be there are good reasons to implement both approach because they have different aim i think.

  2. We had a discussion on whether we should put the queries in a database structure of stick to txt files in Github that are read and parsed by the app. I agree with @schuemie that the Github approach has advantages: we could leverage the version control functionality and could have someone take responsibility of the queries contributed by the community, e.g. through pull requests, validation steps etc. It is crucial that the queries in the library are thoroughly tested. Using markdown and fixed format we have readable format we can parse. @MaximMoinat already showed some work he recently did on this which i think looked promising and I hope you will push further on this and pull into the OHDSI Github.
    @rtmill explained that he sees the query library more as like the vocabulary. You can download a version in a data schema. I can also see the advantages of this approach, but a this moment i would prefer a standalone tool that does not require a database and could for example easily be installed on a Shiny Server, but of course open to further discussion.

  3. We discussed if we would like the tool to be able to run against the CDM itself as @rtmill implemented already in his app. I think we agreed that this would be a nice option but that would require a way to present the results to the user. I could imagine that if you install the package in R you either return the results in the R console, write to file, or show the results in a window in the App. This is of course not possible if we make a public Shiny app not connected to database.

We decided that we would continue this discussion on the Forum with this post. If I missed anything in the summary above please add or correct.

Finally, we wondered who is the current owner of https://github.com/OHDSI/OMOP-Queries it is not completely clear who will deal with pull requests, e.g. from Maxim and others that like to contribute. @MichaelWichers are you that person?

Look forward to further discussion on this topic.

Peter

@Rijnbeek Sorry for the delay.

Perhaps it would be best to elaborate a bit on what I’ve been thinking to see if we can marry the ideas.

Issues to be addressed:
A ) SQL queries for OHDSI scattered among formats, repos, SQL flavors
B ) It’s difficult for those without programming background to implement and contribute to OHDSI
C ) Community developed code is rarely shared causing an unneccesary replication of work

While moving the ‘OMOP-Queries’ repo into a standard format and making it easily accessible would be a step in the right direction, it only touches on issue A and ignores the other two. If we develop a standard format and structure for parameterized SQL, an intuitive way for community members to share their own queries, and a local tool to run the queries, then the group of people who can contribute to and leverage OHDSI would be much less exclusive.

For this to work I think there are two distinct pieces:

  • a centralized library of SQL queries with a webapp on top
  • a local tool to ingest and run the libraries against your local dbs

I agree it makes sense to focus on the former piece as we have been. Having a validated and protected set of official OHDSI queries seems definite, but can we also incorporate the sharing of user generated queries into the same tool? Perhaps this sandbox of user queries routinely gets mined and added to the ‘standard’ queries? Not sure.

This all boils down to scope. If the focus of this effort is to produce a webapp to be able to copy and paste the ‘OMOP-Queries’ then surely what I am proposing is overkill.

Thanks @rtmill and @Rijnbeek. We should definitely decide on the scope and also the target user.

For the initial scope, as mentioned in the call, I suggest a central app with a simple query library of independently runnable sample queries. The target users are data scientists new to OHDSI; i.e. they have experience with sql, but lack OHDSI/OMOP specific knowledge. They will use the app to learn to work with their newly setup OMOP CDM, which they can easily access through their favourite database tool.

I agree that an app that supports query execution against your own local database would be very useful. However, this will require a local installation of the app, which makes the tool less accessible.

@schuemie Would it be possible to make the Shiny App run against the ohdsi.org database? Or would this give all kinds of security issues?

Excellent summary. Achilles developers watch with anticipation :slight_smile:

I want to draw out a point covered in your excellent summary, @Rijnbeek and in @rtmill 's reply. Beyond making it easier to run code that already exists - this may present an opportunity to:

  1. Promote more SQL code contributions by community members
  2. Make the validity of contributed code transparent

To achieve 1 there would need to be an easy way to upload SQL in the appropriate form with the appropriate metadata about purpose, etc.
To achieve 2 there would need to be an easy way to upload metadata on the validty of the code. That might include attestations by people that have run it in different environments, automated code validation software output, etc.

A solution to those goals could strike an attractive the balance between ease of code sharing and confidence that code does what it purports to do. Eventually that balance might make sense to bring to other sets of OHDSI SQL scripts, e.g. for Achillies etc., For now, I agree that it makes good sense to start with this use case.

I am agnostic about whether that functionality should be provided via files that only live on GitHub or are ingested by a local tool. I draw out the above points to encourage us to keep those goals in mind in defining the approach to take because I think they are potentially of wide-ranging importance.

Sorry I’m late to the game! If this reply is better-suited to its own post, please let me know and I’m happy to post elsewhere. It’s a different approach to what seems like a similar problem.

About @rtmill’s points, and a ‘vocabulary’ of queries as mentioned in @Rijnbeek’s update: In addition to the proposed interfaces to query a CDM, the OMOP CDM might be a good candidate for a domain-specific language. There’s a data model around which to work, with the focus of the model being a particular unit (‘person’), and several queries could form a domain-specific-vocabulary that users could then compose into more complex queries.

As an example, I put together some code (happy to share, premature as it may be) to achieve the following (using dbplyr and an OMOP CDM on PostgreSQL, but it could be suited to the parameterized SQL used in SqlRender):

everyone %who_took% aspirin_concept_ids
anyone %first_took% aspirin_concept_ids
selected_pop <- everyone %who_had% diabetes_concept_ids
selected_pop %first_took% aspirin_concept_ids

To explain, the ‘anyone’/‘everyone’ arguments on the left-hand side (lhs) are not evaluated and instead interpreted as a ‘WHERE TRUE’ clause. If the user provides a left-hand side dataframe containing a ‘person_id’ column, then the statement is instead interpreted as a ‘WHERE person_id IN {lhs person_ids}’ clause. Each of %who_took% and %who_had% return the DRUG_EXPOSURE and CONDITION_OCCURRENCE tables, respectively, but the person_id-based table filtering means that these statements may be composed.

Additionally, I could imagine a ‘+’ operator that would do something like bind_rows if both the lhs and rhs are of the class cdm_person, but performs a left-join by person_id if the lhs is of class cdm_person and the right-hand side is a different cdm class (e.g. cdm_drug_exposure). Each table would have its own class.

Unfortunately, increasing user-friendliness by abstracting away SQL may not be so helpful as an educational tool, which is definitely a drawback as far as @MaximMoinat’s point about helping people learn the CDM. At the same time, maybe it could be a good gateway (especially if users inspect the underlying SQL)?

Again, a DSL is perhaps too high-level, and is not a Shiny App, so forgive if this is off-topic. Going back to @rtmill’s post, such wide scope could very well be overkill. However, it strikes me as a great chance to leverage R and metaprogramming because the ‘standard’ queries are in a great position to be functioned away (with well-defined inputs/outputs and SqlRender–thanks @schuemie!).

Have a great weekend!

P.S.
Each of who_took and who_had are implemented as infix functions with the % on either side, but also as functions to be called using parenthetical notation (e.g. who_had(...)). The who_had function has an argument first_only that is used to create first_had, this is also used for the drug exposure function who_took to implement first_took. Alternatively, the first_took function can be recapitulated as first_time(who_took(...), 'drug_exposure'), which is possible because the first_time function works on a table regardless of where it is implemented. ‘drug_exposure’ or ‘condition’ is then pasted into a string that is used for the query to extract the earliest occurrence of e.g. drug_exposure_start_datetime, grouped by person_id and whatever extra arguments are provided as part of ... (e.g. drug_concept_id).

The concept_id objects are made more user-friendly by sanitizing the input, allowing functions to accept a e.g. a vector of concept ids, data.frame of concept ids, and associated user-defined concept id names (if provided as a named list, dataframe, etc).

Under the hood, there is a set of evaluators. Right now it uses S3, ideally, these would use multiple dispatch (with the S4 system). Or maybe it would use the more standard-OO-style R6 system, still working that out. I’ve been working mainly from T. Mailund’s recent work on DSLs in R

Hi Andrew,

I like the idea of have the community contribute code. We do need to have a process to validate that code first.
Uploading directly in an app i think may not be the preferred solution for this (without that process). Adding metadata on the validity is an interesting concept we could indeed think about, but i suggest to start simple first and have people add new scripts through an issue type approach which gets validated (how to decide) before it is added in the app.

Peter

t