Data Quality Dashboard

clairblacketer · June 15, 2019, 6:14pm

All,

As most of us know, data quality has become increasingly important as a factor that regulatory agencies use to determine if a database can be considered ‘fit-for-use’ when it comes to informing decisions. To that end, I have volunteered to lead a new effort around creating an OHDSI data quality dashboard. The goal is to agree upon a set of data quality checks we would like to run against an OMOP CDM instance, on top of which can sit a dashboard of some type. I have an initial design that lays out some checks and potential UI, all working within the Kahn framework but there is still work to be done before implementation can begin. The remaining questions are

Are the checks we have listed enough for a version 1 of the dashboard/tool?
Will there be consideration for trends over time?
Will there be a process to add new rules?
Will there be a data quality check for source mappings?
How will benchmark values be decided on?
We will need to make sure the tool can provide a way to drill down to the individual flagged rows.
How do we handle checks that are always red or always green?

Many in the community have already offered their expertise: @Andrew, Ajit Londhe, @davidcarnahan, @rtmill, @Vojtech_Huser, Mui Van Zandt, @Rijnbeek, Maxim Moinat, Mark Khayter, @DTorok, @cukarthik, Frank DeFalco, Christian Reich, @mgkahn, Clark Evans, Greg Klebanov, @Patrick_Ryan, @SCYou, and Tim Berquist. Please let me know if you would like to join - the plan is to meet next week to go over the current design and answer some of the questions above; here is a link to a doodle poll to fill out https://doodle.com/poll/evytwqhh7r3fw9cq. Once I figure out a good time for everyone I will post meeting information to this thread.

I am really excited about this effort and I am looking forward to everyone’s ideas!

Clair

Apparently I can only mention 10 users in a post which is why not everyone is tagged

clairblacketer · June 18, 2019, 3:25pm

Thank you to all who participated in the doodle poll. The time that works best for everyone is tomorrow, June 19, 2019 at 10am eastern. Here is the meeting link:

Join Skype Meeting

Trouble Joining? Try Skype Web App

Join by phone

Toll number: +1 (908) 316-2436,19719558# (Dial-in Number) English (United States)

Find a local number

Conference ID: 19719558

cce · June 19, 2019, 3:20pm

Clair, Thank you for hosting this meeting. Given the urgency and priority of this initiative, I’m wondering if it’d be appropriate for it to have its own “Data Quality” category here in our forums? Second, I suggested that a quality control effort may want to have regression tests as the 1st development stage, and this connects this work with the urgent need for a demo database which @schuemie has started, so that we could validate that the checks are producing the sort of results we expect. Third, I suggested that rather than start with implementing, we could start by producing the expected output of the tool, in the expected output format – this way we could have a more concrete discussion of what the scope of the project is, and those who like to work on user interfaces would have something to target now, rather than waiting for a code drop. Having a tight community-oriented feedback loop with regression tests and sample outputs are an important step for a successful delivery. This way we can get user feedback working before we even start to write code.

clairblacketer · June 21, 2019, 7:23pm

All,

We had a productive discussion on Wednesday and I appreciate everyone who could join and give feedback. We are working within a narrow scope of assessing data quality at the CDM specification level for a v1. Below are the links to the documents we are using to describe the design of the dashboard from a ground-up perspective as well as the proposed quality checks we would like to implement for a phase 1. Please take a look and leave comments if the checks do not make sense or if there are any that seem out of scope. Additionally, if there are any you see that could fit into the empty Kahn categories feel free to leave a comment about that as well and we will discuss at our next meeting (TBD).

Design document: https://drive.google.com/file/d/12dQzvrTJIdj-2ioNYtn6gkoJJ3CP1usE/view?usp=sharing
Data Quality checks: https://drive.google.com/file/d/1L14HjaviGkCkb2g3yxtOX7sZkQSoRlCA/view?usp=sharing

Ajit_Londhe · June 24, 2019, 4:26pm

All,

For anyone interested in being a part of the developers group :

I’ve created a Doodle poll to help us select a date and time for the Developers Kickoff call: https://doodle.com/poll/efcfv34fws27bav7. Please note that this is not mandatory for everyone, but just for those interested in the software development side of this project. Vote for the dates/times in which you can attend.

Thanks,
Ajit

mcgrathc · June 26, 2019, 10:51am

Hi Clair, thanks for your efforts here. Has there been any further thoughts on what CDM versions will be supported by the tool? CDM versions 6.0 and 5.3.1 were mentioned as most likely to be supported at the meeting.

clairblacketer · June 26, 2019, 2:37pm

Hi Conor,

For now the target will be CDM v5.3.1. CDM v6.0 is available but ATLAS and other tools do not support it yet so there has been slow adoption of that version. With that in mind, CDM v5.3.1 seems the best option for now for the DQ dashboard with a look to support CDM v6.0 in the future.

Clair

jposada · June 26, 2019, 4:22pm

I just added my name to the doodle pool. looking forward for the meeting

Ajit_Londhe · June 27, 2019, 5:24am

All,

I just sent out the meeting invite for the Developers Kickoff meeting, which will be held on Monday July 1 at 11 am est.

If you have not received it, please email me or send me a direct message through the forum.

Thanks,
Ajit

clairblacketer · July 1, 2019, 5:49pm

All,

There will be a data quality check design meeting on Monday, July 8th at 12:00pm eastern. The goal for this meeting is to finalize the list of checks we would like performed so please come with questions on the existing checks (listed in the above google doc) or with any additional checks that should be added.

Clair

Note - I sent the meeting invitation out over email. If you did not get one and would like to attend please send me a direct message here or email me at mblacke@its.jnj.com

clairblacketer · July 10, 2019, 2:42pm

Thanks to everyone who joined the design meeting on Monday. We had a very productive discussion (recording available here) and decided to meet again this Friday, July 12th at 9am eastern. Invites have gone out but I am happy to add anyone who would like to join - see email address above.

Clair

cce · August 30, 2019, 9:54pm

I just wanted to point out another paper that might be of interest to this effort.

aldirjr · September 2, 2019, 12:22am

Do we have an updated version of these documents? or a page we can check the latest discussions?

clairblacketer · September 3, 2019, 3:37pm

@aldirjr yes, thank you for the reminder! We have moved everything to our github where we have our first version of the tool that will be demo’d at this year’s symposium: https://github.com/OHDSI/dataqualitydashboard

clairblacketer · November 22, 2019, 8:15pm

Hi All,

A couple weeks ago we had our first DQD development meeting since the symposium. We brainstormed our goals for the upcoming year and tasks we need to accomplish to achieve those goals:

Goals and Objectives for Data Quality Dashboard (DQD)

Use of the DQD to impact regulatory decision-making
- Specifically proving that we have done due-diligence in investigating the quality of our data
Domain-related quality assessment
- Specifically proving that we have done due-diligence in investigating the quality of our data in relation to the clinical question being asked
Evaluation of data sources prior to analysis or purchase
- Study feasibility assessment in network research
Transparency of decisions around thresholds and choices made
Temporal DQD results assessment, change over time
- Within a source
- Within a network

Tasks

Persistence of the DQD results such that they can be built into a study
- Minor change to requirements of runDQD call
- Add to skeleton study
Testing of cohort-run of DQD
Addition of more rules
- Vojtech to volunteer for that
Goal #1 dependent on cohort DQD task

We will be meeting every two weeks on Fridays at 3pm eastern. Please contact me if you would like to join the discussion.

Link to larger Data Quality workgroup

Andrew · November 22, 2019, 8:35pm

I would like to join. Thanks.

watson_adam · November 22, 2019, 10:40pm

Hi Clair,

Please add me to the Friday 3pm calls.
thanks,
Adam

jposada · November 25, 2019, 7:51pm

I would like to be adde to the call, Thank you@clairblacketer

katy-sadowski · November 25, 2019, 10:01pm

I’d like to join as well. Particularly interested in the cohort runs - really love this idea.

dblatt · November 26, 2019, 2:46am

I would like to join as well!

David Blatt