To @schuemie's points to consider:
- I 100% agree we would need to define a 'small', 'medium', 'large'
version of summary statistic exports, and each site could choose what level
they were comfortable with (and of course, increasingly step up as they get
comfortable with the notion that sharing aggregate information is not a bad
thing but actually can move the science forward).
My proposal as a straw man would be the following ACHILLES reports:
'small' = 'proposed minimum set' = 'Dashboard' (CDM summary, population
by gender, age at first observation, cumulative observation, persons with
continuous observation by month), and 'Data density' ('Total rows per
domain', 'Records per person', 'concepts per person'). It's not
implemented in the ACHILLES web portion yet, but I'd argue I'd like to see
the table of IRIS statistics on the 'Data density' report as well.
'medium' = all reports without the concept-level drilldowns. So, for
example, 'Conditions' would allow you to see the treemap and table that
gives you prevalence and records per person, but you won't be able to click
down to see prevalence by age/gender/year, by month, or breakdown by type
or age. This would expose much greater information than 'small', but
would be still be extremely high level, and would represent very little
risk to an institution since all small cell counts would be scrubbed and it
wouldn't be possible to get down to any summary statistics that would be
generated off a low prevalence concept.
'large' = all reports with all drilldowns. We designed ACHILLES to be
low-risk for all content, and we should STRONGLY encourage the community to
follow @rwpark's tremendous example in sharing as broadly as possible.
I don't know how the community feels about the ACHILLES HEEL data quality
report.....clearly its EXTREMELY valuable in understanding whether a
database is ready for research, but some could misinterpret it as exposing
the warts of a source in a way that is unflattering. That's why I've
relegated it to 'medium'/'large' even though I'd personally love to see it
2). Yes, I would propose we put CDM_SOURCE content on the ACHILLES
Dashboard page as well. If sites are following the OHDSI conventions with
V5, this table should be populated with the relevant meta-data to tell
others what the data contain in a useful way. I'd actually think exposing
the CDM_SOURCE content on all OHDSI apps will generally be a very good
3). I'd recommend to go forward with the existing solution, rather than
modifying and making people re-run and then having to do new development on
ACHILLES. I'd only favor reducing the level of data is a site says they
are unwilling to share unless this was done (and I'd be fascinated to here
why that would be the case).
4) We have a public ACHILLES already: http://ohdsi.org/web/ACHILLES.
To amend @jon_duke's recommendation, I would NOT suggest we export JSON
files and have them shared to the OHDSI centeral server. Rather, I suggest
we add an export to the ACHILLES R package that creates a .csv of the
underlying ACHILLES_RESULTS and ACHILLES_RESULTS_DIST tables, subset to
only the 'small', 'medium', or 'large' sets as requested by the user. In
this way, the .csv can be loaded into the OHDSI central database and the
JSON can be generated centrally. This will also help get around some of
the occasional performance issues some people run into on JSON export when
they don't have their vocabularies adequately indexed...
So, assuming folks were legitimately willing to join the journey of
evidence sharing, the only three technical tasks left to do are: 1) add an
'exportToOhdsi' function in ACHILLES, 2) @lee_evans can stand up a secure
S3 bucket to host the results (just like we do for our OHDSI network
studies), and 3) we'd need to load the imported csv into a database and
have the ACHILLES ExportToJSON set up to kick out files for whatever
databases get sucked in...