For those who want to see the email dialogue…
Feedback from @Christophe_Lambert:
I struggled with clarity on the difference between some of these courses, so I’ve sent a ranking with annotations requesting clarity:
Here are my rankings from most to least, skewed by my lack of clarity on the proposed course content:
· Population-level estimation
· The OHDSI process - how to lead/run an OHDSI study from start to finish [How is this different from Population-level estimation?]
· Phenotyping/Cohort Definitions [I think a simplified version of this could be folded into a full-day population level estimation course]
· OHDSI tools - an overview of the OHDSI tool stack
· Patient-level prediction
· Data Quality [What specifically about data quality?]
· Intermediate course: students already understand tables and are learning to write SQL codes [I really don’t know what this is supposed to be]
Feedback from @Patrick_Ryan:
Hi Christophe and team:
I agree that there’s some overlap that would be helpful to clarify. I’ll provide my two cents on how I could imagine us delineating various tutorial options and how I would prioritize them.
- Introduction to OMOP Common Data Model and Standardized Vocabularies: @Christian_Reich / @mvanzandt / @ericaVoss / others have done a good job with this tutorial in the past, which is aimed at folks who are new to OHDSI and interested in learning more about the OMOP Common Data Model, particularly for the purposes of ETLing some patient-level dataset that they have into our community standards. I definitely agree that this course is an essential one to offer as one of our beginner courses, and I’m open to the idea previously proposed that this course may be offered twice if we get sufficient interest from the community. The only additional caveat that I will provide here is that the course, as delivered at last year’s symposium and this year’s Europe symposium, is geared primarily to a technical audience, those who actively touch data, are comfortable with installing/implementing technology, and like the idea of writing/executing SQL. While others have taken (and benefitted from) the course, it is probably not the same class we would deliver if the intended audience was purely non-technical, such as folks who lead departments who just want high-level awareness or researchers who want to design studies using OHDSI tools but aren’t going to do the implementation themselves.
- An overview of the OHDSI tool stack: This was a new tutorial that we offered for the first time at the OHDSI Europe Symposium; Rijnbeek / jennareps / schuemie / I delivered the course, and I might de-prioritize this tutorial if we get negative feedback from the participants on the survey form, but at least based on the feedback directly after the course, it seemed very well received and the students were very engaged throughout. The tutorial video has been posted here for those who are interested: http://ohdsi-europe.org/index.php/symposium. Our description of the tutorial was: “This tutorial will in detail discuss the tool ecosystem of OHDSI with a focus on the impressive functionality in ATLAS, such as vocabulary browsing, cohort definitions, risk effect estimations, patient-level prediction, and more. Target Audience:Data holders, researchers, and regulators who want to learn more about the exciting tools developed by the OHDSI community.” Basically, we provided a high-level overview of various topics, doing 30-minutes sessions covering principles and ATLAS implementation for: voacabulary, ACHILLES data characterization, cohort definition, cohort characterization, incidence rate summary, population-level estimation, and patient-level prediction. Each of these topics warrants their own full-day course and this class definitely wasn’t sufficient in any topic, but for those who were looking to get the breadth of OHDSI, without the full depth, I think this class was effective. I could imagine this being the other ‘beginner’ course offered, and I think it works for technical and non-technical audience alike.
- Patient-level prediction: The course that Rijnbeek / jennareps / jswerdel / l gave last year was REALLY good, and its definitely an area that I would like to see our community really grow. I think what Peter and Jenna have built with the PLP package is a game-changer for healthcare analytics, and what is needed now are more analysts making use of these tools to answer real questions in healthcare and applying machine learning solutions to real problems in clinical practice. If we could get Peter and Jenna to again offer this course, I expect there would be a lot of interest from the community, and I’m 100% confident that those who took the course would get a huge amount of value. From last year’s course: " Target Audience: Researchers who want to design prediction studies for precision medicine and disease interception using the OHDSI tools and programmers who want to implement and execute prediction studies using the OHDSI methods library. ".
- Cohort definition/phenotyping: Nigam / jon_duke / Chris_Knoll / I offered a course at the OHDSI Symposium two years ago, and I think there’s been consensus that we need to continue to develop better approaches for designing and implementing phenotypes, both rule-based heuristics and also increasing our use of probabilistic phenotypes. There’s also a HUGE need to develop solutions for evaluating phenotypes, including strategies for estimating misclassification (sensitivity, specificity, positive predictive value). In addition to the necessary methodological research in this space, there’s also the practical reality that we need more people to be able to properly specify and execute basic cohort definitions, and there’s been a lot of requests for introductory instruction in how to use ATLAS for cohort definitions and conceptset construction so that basic phenotypes (like ‘new users of a drug’ or ‘persons newly diagnosed with a condition’) can be much more efficiently generated against OMOP CDMs. From the 2016 course, “Learning objectives and technical competencies:
• Learn principles for cohort definition and evaluation
• Develop rule-based heuristics in ATLAS
• Apply cohort definitions to analytical use cases of: disease phenotyping, exposure definition, and clinical trial feasibility
• Design predictive model-based phenotype evaluation using APHRODITE”
Last year, I tried to give a ‘cohort definition’ class at Erasmus for EMIF and it was awful, I did a horrible job and completely underestimated the time requirements and the baseline knowledge required for the students. Based on that negative experience, I completely redid my course materials, and Jon and I gave a course at the FDA toward the end of last year which went better, but still not great; we didn’t get through all the content we prepared, we rushed through other parts, and there were still a lot of outstanding questions that we didn’t fully address, even though the interactive exercises we prepared. It’s a hard topic to cover principles and implementation, because it really has pre-requisite knowledge about CDM and vocabulary AND you also need to know the contents of the patient-level data you are working with. I’m sure others could do a better job than me in trying to make this course work, but I’d say that whatever team we get to take this on needs to recognize that there’s a challenge in front of them, based on our past experiences. But even though its hard, its also very important and foundational to our ability to conduct all the types of analyses we want, including clinical characterization, population-level prediction, and patient-level prediction. So I think it’s important to continue to try to make this work in our community.
- Population-level effect estimation: I think the course that schuemie / msuchard / Christophe_Lambert / I did last year is still a useful course, to cover the principles and OHDSI tools to support the design and implementation of comparative cohort studies for safety surveillance and comparative effectiveness. I put it lower on the list only because its the course we’ve delivered the most, though Jamie Weaver has offered to lead it this year, so if it makes our top list, we know we will have support for it. From last year’s Symposium: " Target Audience: Researchers who want to design estimation studies for safety surveillance and comparative effectiveness using the OHDSI tools and programmers who want to implement and execute estimation studies using the OHDSI methods library " . It still seems, at least in the epidemiology domain, that the use of a ‘propensity score-adjusted new user cohort design’ is still a prevailing approach being advocated by various parties, from what I see in the literature and ISPE and FDA/CDER Sentinel studies, etc, and it still seems like there is need to build capability within the OHDSI community, given huge number of questions that need answering and the relatively small number of answers that are getting published each year. At some point, I would like to see our community being to also expand the capability in designing and executing population-level effect estimation studies using other designs, including the self-controlled approaches that Marc and Martijn have developed. Perhaps at some point, Christophe would be interesting in further promoting his Local Control method, once the methods can be shown to computationally scale to the large p, large n problems we’ve been wrestling with as a community. In any case, I think the options on the table is to consider a) offering the same course previously provided, b) offer a new course around different population-level estimation methods, or c) revise the course to try to cover more content (either expand the methods represented, try to cover the principles and implementation of empirical calibration, and/or as Christophe suggested, try to cover more content around cohort definition)
- Data quality: I’m not sure what others had in mind here, but I think it would be useful to develop a course on clinical characterization, which would cover topics like how to implement ACHILLES and interpret database-level summary statistics for greater understanding of source data and evaluation of the data quality checks that are represented in ACHILLES HEEL. I could imagine a course extending beyond database-level quality attributes to cover topics around ‘fitness-for-use’ and study feasibility, which could include how to review and interpret cohort characterizations to determine if a source has the appropriate elements with sufficient quality to allow for generating reliable evidence. If we are thinking quality more generally, then other topics that could fit would include methodological evaluation (e.g. how do you know that your method is actually giving you correct answers?) and phenotype evaluation and even the topics of meta-data/annotation that Jon and Ajit have been trying to lead within the community. I do think there’s a LOT more work that can and should be done in the area of developing clinical characterization solutions, many of which could be used for data quality purposes but wouldn’t be limited to that, and I could imagine that one opportunity within a tutorial for data quality would be to identify the specific gaps seen across the community and develop a strategy to address them moving forward.
- OHDSI Process: how to lead/run an OHDSI study from start to finish. To Christophe’s question, I would imagine this is much broader than population-level effect estimation, because it could be any study of any type (including clinical characterization or patient-level prediction). I would anticipate what we would want to cover in such a course is how to design a study, write a protocol, elicit community review and participation, design and test analytical source code that implements the finalized protocol, and how to aggregate summary statistics as results from a network study. What’s a bit unclear is who the target audience for such a course would be, whether the non-technical researcher who has a good clinical question but wouldn’t implement it or the technical researcher who wants to implement good questions (theirs or others), or both. Assuming we want some level of technical competence, then I would say that one specific technical skill to build would be how to use SqlRender, so that you can implement queries that can run across the OHDSI network independent of the source environment. I’d probably also like to see the basics of how to design a R package that can execute an analysis and generate aggregate results, which can then be shared securely with a study coordinator. All that said, I don’t know who would be willing and able to develop such a course, and of course, whoever that would be would get to drive the direction.
- Intermediate course: students already understand tables and are learning to write SQL codes. I’m not sure exactly what the thought was here this time around, but I remember last year, Kristin and I proposed a new tutorial that would have been ‘OHDSI 101 for analysts’, where we would have focused mainly on programmers who had access to an OMOP CDM who wanted to understand how to perform various basic analytic use cases via SQL, such as cohort definition and basic clinical characterization. I thought it was a good idea at the time, but was fine to be out-voted for other options. This year, I would favor the ‘OHDSI tool overview’ tutorial over the idea of teaching SQL, mainly because I think the OHDSI toolstack course is more accessible to a wider audience (including non-programmers), and it’s more aligned with the community efforts toward developing and applying shared solutions for standardized analytics than instructing on custom code for one-off analyses.
Feedback from @Andrew:
Since I suggested it [What specifically about data quality?] here I my thoughts on that.
- A presentation of pedagogical material to frame the interpretation and guide the methods of assessing and reporting on DQ
a. The harmonized data quality framework proposed by Kahn et al for warehouse-level.
b. The consensus recommendations for reporting on data quality for analytic datasets derived from data warehouses.
- Tools that run against the CDM that map DQ results to those frameworks/recommendations
a. Hossein Estiri/Kari Stephens - DQe-c, DQe-v, and DQe-p
b. Rhitu Khare/PEDSnet - Data Quality Assessment Toolkit
c. Vojtech_Huser - Achilles Heel
I think the frameworks and associated tools are mature and useful and not as widely appreciated or used as they should be.
This would teach people a coherent and practical way to deal with a set of highly consequential and complex issues.
Feedback from @ana_szarfman:
I think that we need bite the bullet and also explain our audience the current widespread problem of the lack of true interoperability and traceability from and to the data being originally generated.
• When working with non-interoperable data, every step downstream to the end users introduces data inconsistencies and data transformation errors.
• The current proposed solutions for implementing interoperability only addresses an unrealistic limited number of very simplistic use cases.
• For example, the ONC proposed solutions to address interoperability do not address the problems of the most vulnerable patients:
• No ICU cases
• No patients with multiple risk factors
• Too frequently, data are stored as static paper-like reports and transmitted by fax-like approaches.
• This problem spans Sponsors, Central Labs, Local Labs, Laboratory Information Systems (LISs), and the diversely different Electronic Health Records (EHRs) and hospital systems across the Nation.
• All this processing occurs without any concept of traceability to the originally terms generated by the lab instruments and other sources of information.
Feedback from @razzaghih:
Thanks for the great discussion, everybody. I’m enjoying reading all the comments and thoughts. Here are my thoughts and top picks, if it’s helpful:
• I very much agree with Andrew’s framework for discussing data quality. I really like how that’s laid out. I was recently at AMIA joint summits and I think the interest around data quality is increasing. There was a presentation on of the PCORnet studies that looked at approaches for evaluating data quality as specifically applied to a distributed data research network. It was all done within the context of the PCORnet abx study, but it generated a lot of discussion and I think people are curious about what that might look like. The abx paper will be published soon. The Kahn framework is very theoretical – I think it would be valuable to think about practical approaches as well (which Andrew also pointed out in #2). The data quality I think attracts a technical group as well as a traditional epidemiologist, so we’d have to be thoughtful about how to target, if we want to target. There’s so many ways to evaluate data quality using OMOP – both for warehousing as well as study specific data quality checks and methods.
• Cohort definitions is another really great topic. I think that Atlas provides a powerful tool for cohort definition and it would be a good addition to the tutorial. I also think it would be helpful for people to understand different approaches for phenotyping and what works and with what type of data. A mix of a theoretical approach and the practical, “How can OMOP facilitate phenotyping?” may be the right approach to take.
• Overview of the OHDSI tool stack: I love this topic. We’ve been an OHDSI shop here for years and I still don’t feel like we leverage all the tools correctly. I think the trick for this one will be to figure out whether we focus on the technical implementation/maintenance or whether it’s an overview of what the tools are, their function, and their usability.
Those are my top 3. I agree that asking the community may be helpful.