OHDSI Home | Forums | Wiki | Github

Short and quick way to describe your CDM dataset

During our first study (TxPathways), we needed a short way to communicate how large is a given CDM dataset and how large the input or output cohort is.

I understand that there is Achilles to providing a graphical view of many parameters (e.g., age, lenght of observation periods), but it is not possible to create a quick extract from Achiles to email a collaborator.

For example as IMEDs user, I can not see the Achilles JSON data (either as tables in database or as .JSON files)

I would like to propose an idea of a set of simple SQL code file that would output just a few selected simple parameters (either just plain output when running the query or output into a temp table in a database and listing that table)

for example: (this is real data for CCAE dataset in IMEDS)

MEASURE     RESULT     EXPLANATION
G1     141,805,491        count of patients
G2     20,328,289,601     count of events
D2     90,024,522         count of patients with at least 1 Dx and 1 Rx
D3     112,148,500        count of patients with at least 1 Dx and 1 Proc
D4     5,939,621          count of patients with at least 1 Obs, 1 Dx and 1 Rx
D5     277,975            count of deceased patients

My feeling is that perhaps not all CDM adopters/users made Achilles work (in an afternoon) and have a spare web server for viewing the results. They may be situations where Achilles may be too complex to use.

I would be interested in what people think of a quick few-numbers-snapshot of the data. (I have a tentative name (IRIS) for this approach.

We would have a first non-J&J [non-Regenstrief] software tool… (excluding terminology products here)

Great idea Vojtech! ACHILLES is definitely overkill if you only want these
few results. It’d be wonderful to see a new application that could be an
easier starting point for the uninitiated. Thanks for taking this on!

Hi Vojtech,

how about the below SQL code as a first draft? This is a postgresql version.

I wasn’t sure what the correct definition should be for the ‘events’ measure so I left out measure G2.

Please take a look and see if you get the expected results when you run it (I don’t have any data to test with). If you can provide the G2 measure SQL statement I can add it to the code. I wrote it in parameterized T-SQL so we can generate the SQL to run it on any of the OHDSI supported DBMSs.


– MEASURE RESULT EXPLANATION
– G1 n count of patients
– ** NEED TO ADD THIS ** G2 n count of events
– D2 n count of patients with at least 1 Dx and 1 Rx
– D3 n count of patients with at least 1 Dx and 1 Proc
– D4 n count of patients with at least 1 Obs, 1 Dx and 1 Rx
– D5 n count of deceased patients

/cdmSchema: CDMV5/
/resultsSchema: ohdsi/
/irisResults: iris_results/

SET search_path TO ohdsi;

DROP TABLE IF EXISTS iris_results;

– create a temp table to hold the results
CREATE TEMP TABLE iris_results
(
MEASURE VARCHAR(2),
RESULT NUMERIC,
EXPLANATION VARCHAR(255)
);

– count of patients
insert into iris_results
select ‘G1’ as MEASURE, COUNT(distinct person_id) as RESULT, ‘count of patients’ as EXPLANATION
from CDMV5.PERSON;

– count of patients with at least 1 Dx and 1 Rx
insert into iris_results
select ‘D2’ as MEASURE, COUNT(distinct p1.person_id) as RESULT, ‘count of patients with at least 1 Dx and 1 Rx’ as EXPLANATION
from CDMV5.PERSON p1
inner JOIN CDMV5.CONDITION_ERA ce
ON p1.person_id = ce.person_id
inner JOIN CDMV5.DRUG_ERA de
ON p1.person_id = de.person_id;

– count of patients with at least 1 Dx and 1 Proc
insert into iris_results
select ‘D3’ as MEASURE, COUNT(distinct p1.person_id) as RESULT, ‘count of patients with at least 1 Dx and 1 Proc’ as EXPLANATION
from CDMV5.PERSON p1
inner JOIN CDMV5.CONDITION_ERA ce
ON p1.person_id = ce.person_id
inner JOIN CDMV5.PROCEDURE_OCCURRENCE po
ON p1.person_id = po.person_id;

– count of patients with at least 1 Obs, 1 Dx and 1 Rx
insert into iris_results
select ‘D4’ as MEASURE, COUNT(distinct p1.person_id) as RESULT, ‘count of patients with at least 1 Obs, 1 Dx and 1 Rx’ as EXPLANATION
from CDMV5.PERSON p1
inner JOIN CDMV5.OBSERVATION_PERIOD op
ON p1.person_id = op.person_id
inner JOIN CDMV5.CONDITION_ERA ce
ON p1.person_id = ce.person_id
inner JOIN CDMV5.DRUG_ERA de
ON p1.person_id = de.person_id;

– count of deceased patients
insert into iris_results
select ‘D5’ as MEASURE, COUNT(distinct p1.person_id) as RESULT, ‘count of deceased patients’ as EXPLANATION
from CDMV5.PERSON p1
inner JOIN CDMV5.DEATH d
ON p1.person_id = d.person_id;

– display the results
select * from iris_results;

Lee, thanks for taking the initiative!. In fact, I already was piloting some paremtized SQL code and was using RenderSQL package to translate into dialects.

It is available here:

I had even a switch between v4 and v5 and your code is running on v5.
Another variable is whether the user has rights to create new tables so one version of the SQL does not not create any temp or interim tables.

(I had the code even prior posting the forum question, but I wanted to get some ideas on what people would like to have the tool do without providing the tool)

Cool. Actually I created parameterized SQL code too but I thought you were using Redshift and the postgresql translation would be the closest to that.

I used my sqlrenderweb web app that wraps around the sqlrender package to create the code, so I only spent 10 minutes on it. That app is still in beta and it’s not ready yet for general use but I’ll send you the url via your linkedin email in case you want to experiment with it.

It’s good that people now have a link to the github sandbox IRIS code in this discussion so they can contribute to the code or add to the requirements.

Thanks to Patrick (creating IRIS project) - IRIS beta version 0.1 is now more formally in GitHub

It was in sandbox previously.
Some of the SQL code was written very defensively (without the benefit of SqlRender web), so it may change to be simpler and include more measures. Please submit more measures that you would like to see.

t