To @schuemieās points to consider:
- I 100% agree we would need to define a āsmallā, āmediumā, ālargeā
version of summary statistic exports, and each site could choose what level
they were comfortable with (and of course, increasingly step up as they get
comfortable with the notion that sharing aggregate information is not a bad
thing but actually can move the science forward).
My proposal as a straw man would be the following ACHILLES reports:
āsmallā = āproposed minimum setā = āDashboardā (CDM summary, population
by gender, age at first observation, cumulative observation, persons with
continuous observation by month), and āData densityā (āTotal rows per
domainā, āRecords per personā, āconcepts per personā). Itās not
implemented in the ACHILLES web portion yet, but Iād argue Iād like to see
the table of IRIS statistics on the āData densityā report as well.
āmediumā = all reports without the concept-level drilldowns. So, for
example, āConditionsā would allow you to see the treemap and table that
gives you prevalence and records per person, but you wonāt be able to click
down to see prevalence by age/gender/year, by month, or breakdown by type
or age. This would expose much greater information than āsmallā, but
would be still be extremely high level, and would represent very little
risk to an institution since all small cell counts would be scrubbed and it
wouldnāt be possible to get down to any summary statistics that would be
generated off a low prevalence concept.
ālargeā = all reports with all drilldowns. We designed ACHILLES to be
low-risk for all content, and we should STRONGLY encourage the community to
follow @rwparkās tremendous example in sharing as broadly as possible.
I donāt know how the community feels about the ACHILLES HEEL data quality
reportā¦clearly its EXTREMELY valuable in understanding whether a
database is ready for research, but some could misinterpret it as exposing
the warts of a source in a way that is unflattering. Thatās why Iāve
relegated it to āmediumā/ālargeā even though Iād personally love to see it
in āsmallā.
2). Yes, I would propose we put CDM_SOURCE content on the ACHILLES
Dashboard page as well. If sites are following the OHDSI conventions with
V5, this table should be populated with the relevant meta-data to tell
others what the data contain in a useful way. Iād actually think exposing
the CDM_SOURCE content on all OHDSI apps will generally be a very good
ideaā¦
3). Iād recommend to go forward with the existing solution, rather than
modifying and making people re-run and then having to do new development on
ACHILLES. Iād only favor reducing the level of data is a site says they
are unwilling to share unless this was done (and Iād be fascinated to here
why that would be the case).
- We have a public ACHILLES already: http://ohdsi.org/web/ACHILLES.
To amend @jon_dukeās recommendation, I would NOT suggest we export JSON
files and have them shared to the OHDSI centeral server. Rather, I suggest
we add an export to the ACHILLES R package that creates a .csv of the
underlying ACHILLES_RESULTS and ACHILLES_RESULTS_DIST tables, subset to
only the āsmallā, āmediumā, or ālargeā sets as requested by the user. In
this way, the .csv can be loaded into the OHDSI central database and the
JSON can be generated centrally. This will also help get around some of
the occasional performance issues some people run into on JSON export when
they donāt have their vocabularies adequately indexedā¦
So, assuming folks were legitimately willing to join the journey of
evidence sharing, the only three technical tasks left to do are: 1) add an
āexportToOhdsiā function in ACHILLES, 2) @lee_evans can stand up a secure
S3 bucket to host the results (just like we do for our OHDSI network
studies), and 3) weād need to load the imported csv into a database and
have the ACHILLES ExportToJSON set up to kick out files for whatever
databases get sucked inā¦