OHDSI Home | Forums | Wiki | Github

Data Quality Dashboard


(Clair Blacketer) #1

All,

As most of us know, data quality has become increasingly important as a factor that regulatory agencies use to determine if a database can be considered ‘fit-for-use’ when it comes to informing decisions. To that end, I have volunteered to lead a new effort around creating an OHDSI data quality dashboard. The goal is to agree upon a set of data quality checks we would like to run against an OMOP CDM instance, on top of which can sit a dashboard of some type. I have an initial design that lays out some checks and potential UI, all working within the Kahn framework but there is still work to be done before implementation can begin. The remaining questions are

  • Are the checks we have listed enough for a version 1 of the dashboard/tool?
  • Will there be consideration for trends over time?
  • Will there be a process to add new rules?
  • Will there be a data quality check for source mappings?
  • How will benchmark values be decided on?
  • We will need to make sure the tool can provide a way to drill down to the individual flagged rows.
  • How do we handle checks that are always red or always green?

Many in the community have already offered their expertise: @Andrew, Ajit Londhe, @davidcarnahan, @rtmill, @Vojtech_Huser, Mui Van Zandt, @Rijnbeek, Maxim Moinat, Mark Khayter, @DTorok, @cukarthik, Frank DeFalco, Christian Reich, @mgkahn, Clark Evans, Greg Klebanov, @Patrick_Ryan, @SCYou, and Tim Berquist. Please let me know if you would like to join - the plan is to meet next week to go over the current design and answer some of the questions above; here is a link to a doodle poll to fill out https://doodle.com/poll/evytwqhh7r3fw9cq. Once I figure out a good time for everyone I will post meeting information to this thread.

I am really excited about this effort and I am looking forward to everyone’s ideas!

Clair

Apparently I can only mention 10 users in a post which is why not everyone is tagged :cry:


(Clair Blacketer) #2

Thank you to all who participated in the doodle poll. The time that works best for everyone is tomorrow, June 19, 2019 at 10am eastern. Here is the meeting link:

Join Skype Meeting

Trouble Joining? Try Skype Web App

Join by phone

Toll number: +1 (908) 316-2436,19719558# (Dial-in Number) English (United States)

Find a local number

Conference ID: 19719558


(Clark C. Evans) #3

Clair, Thank you for hosting this meeting. Given the urgency and priority of this initiative, I’m wondering if it’d be appropriate for it to have its own “Data Quality” category here in our forums? Second, I suggested that a quality control effort may want to have regression tests as the 1st development stage, and this connects this work with the urgent need for a demo database which @schuemie has started, so that we could validate that the checks are producing the sort of results we expect. Third, I suggested that rather than start with implementing, we could start by producing the expected output of the tool, in the expected output format – this way we could have a more concrete discussion of what the scope of the project is, and those who like to work on user interfaces would have something to target now, rather than waiting for a code drop. Having a tight community-oriented feedback loop with regression tests and sample outputs are an important step for a successful delivery. This way we can get user feedback working before we even start to write code.


(Clair Blacketer) #4

All,

We had a productive discussion on Wednesday and I appreciate everyone who could join and give feedback. We are working within a narrow scope of assessing data quality at the CDM specification level for a v1. Below are the links to the documents we are using to describe the design of the dashboard from a ground-up perspective as well as the proposed quality checks we would like to implement for a phase 1. Please take a look and leave comments if the checks do not make sense or if there are any that seem out of scope. Additionally, if there are any you see that could fit into the empty Kahn categories feel free to leave a comment about that as well and we will discuss at our next meeting (TBD).

Design document: https://drive.google.com/file/d/12dQzvrTJIdj-2ioNYtn6gkoJJ3CP1usE/view?usp=sharing
Data Quality checks: https://drive.google.com/file/d/1L14HjaviGkCkb2g3yxtOX7sZkQSoRlCA/view?usp=sharing


(Ajit Londhe) #5

All,

For anyone interested in being a part of the developers group :

I’ve created a Doodle poll to help us select a date and time for the Developers Kickoff call: https://doodle.com/poll/efcfv34fws27bav7. Please note that this is not mandatory for everyone, but just for those interested in the software development side of this project. Vote for the dates/times in which you can attend.

Thanks,
Ajit


(Conor McGrath) #6

Hi Clair, thanks for your efforts here. Has there been any further thoughts on what CDM versions will be supported by the tool? CDM versions 6.0 and 5.3.1 were mentioned as most likely to be supported at the meeting.


(Clair Blacketer) #7

Hi Conor,

For now the target will be CDM v5.3.1. CDM v6.0 is available but ATLAS and other tools do not support it yet so there has been slow adoption of that version. With that in mind, CDM v5.3.1 seems the best option for now for the DQ dashboard with a look to support CDM v6.0 in the future.

Clair


(Jose Posada) #8

I just added my name to the doodle pool. looking forward for the meeting


(Ajit Londhe) #9

All,

I just sent out the meeting invite for the Developers Kickoff meeting, which will be held on Monday July 1 at 11 am est.

If you have not received it, please email me or send me a direct message through the forum.

Thanks,
Ajit


(Clair Blacketer) #10

All,

There will be a data quality check design meeting on Monday, July 8th at 12:00pm eastern. The goal for this meeting is to finalize the list of checks we would like performed so please come with questions on the existing checks (listed in the above google doc) or with any additional checks that should be added.

Clair

Note - I sent the meeting invitation out over email. If you did not get one and would like to attend please send me a direct message here or email me at mblacke@its.jnj.com


(Clair Blacketer) #11

Thanks to everyone who joined the design meeting on Monday. We had a very productive discussion (recording available here) and decided to meet again this Friday, July 12th at 9am eastern. Invites have gone out but I am happy to add anyone who would like to join - see email address above.

Clair


t