Database-wide descriptive statistics sharing

Patrick_Ryan · May 25, 2016, 12:07pm

I really appreciate @vojtech_huser pushing on this topic, and thanks
@schuemie for keeping the momentum.

I’m not wedded to this idea, but I just want to throw it out there:

Right now, it seems the perception about ACHILLES is that ‘its all or
nothing’. That is, if you run ACHILLES out of the box, it provides a
fairly comprehensive summary of all the content across all the tables in
the CDM. On the one end, we’ve heard from folks, including @rijnbeek and
@vojtech_huser, that there are additional aggregate summary statistics
which would be desirable to add into ACHILLES and ACHILLES HEEL to better
characterize the population and evaluate data quality issues. On the
other end, we’ve got data holders who still aren’t comfortable with sharing
these granular aggregate summary statistics, potentially for a variety of
scientific and non-scientific reasons.

In fact, ACHILLES offers the ability for users to choose which summary
statistics to generate (by setting the parameter to include the list of
analysis_ids) and a user can also opt to export only a subset of the JSON
files that are used to render the web UI. It could be very reasonable (and
technically already possible) that some data holders may opt to run the
entire ACHILLES build to have shown locally, but then may choose to expose
only a subset of the summary statistics in a version that is made publicly
available on the OHDSI website.

I think @schuemie’s use case is an important one that we’ve now run into
several times already as a community. To satisfy this use case, we’d be
asking sites to expose the types of data they contain, but ideally you’d
also like to have the basic prevalence of each drug (like here:
http://www.ohdsi.org/web/achilles/#/OHDSI_Sample_Database/drugeras) and
condition (like here:
http://www.ohdsi.org/web/achilles/#/CMS_SYNPUF_synthetic_data/conditions).
(Thanks to @rwpark and @lee_evans for sharing content on the OHDSI
website).

Ultimately, this comes down to each data partner’s comfort with sharing.
Some may only be willing to share a little at the start, some may do more,
some may follow @rwpark’s excellent example and share it all. Ideally,
we’d have only one platform that would accommodate all sites at whatever
level of sharing they were willing to entertain. It seems like ACHILLES
already IS this platform. Rather than building a different reporting
structure in wiki, why don’t we organize the reports in ACHILLES around
these principles of sharing. So, there could be a report that provides the
top-line information that @schuemie and @vojtech_huser propose (which
essentially is in ACHILLES as the ‘Person’ report (
http://www.ohdsi.org/web/achilles/#/OHDSI_Sample_Database/person) plus the
‘Data density’ report (
http://www.ohdsi.org/web/achilles/#/OHDSI_Sample_Database/datadensity)).
If we find that there’s something not in ACHILLES that we’d like to add
into a report (like for example IRIS statistics), then instead of building
a different mechanism for it, why not either build a new report inside of
ACHILLES or add the table of stats to an existing report. And this way,
sites can opt to share as much or as little of the summary as they want,
and when an end user goes out to OHDSI.org/web/ACHILLES, they’ll see all
reports available but the content will only show for the sites who opted to
share it (and will pop up as blank for those who opt against sharing for
now).

Would sites be willing to expose some (if not all) of their ACHILLES
results under this approach?