OHDSI Home | Forums | Wiki | Github

Queries for OHDSI on RDBMS for use on Hadoop

Team, we’ve started work to build out a draft of the OHDSI data model on Hadoop (Impala), based on the existing data models from RDBMS.

We thought that it might be helpful to include views that may be common to how people use their data today. We are seeking some popular sample queries from people that are grabbing data sets today in the relational model so that we can build those views into Impala.

Can anyone share some common select statements that you might use today in the existing OHDSI Common Data Model?

Thanks,
Derek

Hi @dkane,

I don’t know if this is helpful to mention, but most people in the OHDSI community interact with their data in the Common Data Model (CDM) through open source applications. Some of these applications are interactive web-based tools such as Atlas and Achilles. Others are R packages such as CohortMethod and SelfControlledCaseSeries.

These applications use SQL, and you can find the SQL for example here and here. The SQL uses some markup for parameterization, and is automatically translated to the SQL dialects we currently support (SQL Server, PostgreSQL, Oracle, RedShift, PDW) through a tool we developed called SqlRender.

Hope this helps.

Cheers,
Martijn

Martin,
Thanks for the background info. I’m a bit new to this, but it sounds like I
need to spend some time with SQL render.

Derek

Good day,
With the use of open source tools like (IPython) Jupiter, one should be able to utilize the OHDSI’s R-packages to create a standard OHDSI NoteBook and point it at different storage platforms (HDFS/Amazon S3) and storage formats (TEXT/AVRO/Parquet) and other data sources (HIVE/IMPALA/etc).

Here’s a drug-era builder SQL. It’s kinda complicated, it uses window functions and nested queries. No temp tables on this one tho, but it might be a good test:

And another one that calculates percentiles from the year of birth in the person table:

t