Request for Input: OHDSI Tutorial Topics

krfeeney · April 18, 2018, 2:20pm

On last Thursday’s US Symposium WG call, the team assembled a list of topics we should cover in this year’s tutorial sessions. After a flurry of emails, @Christian_Reich reminded us that this kind of dialogue could be useful for the broader community to engage in.

Of note, we already agreed to offer CDM & Vocabulary twice so you do not need to vote on that. This will be definitely covered.

Below are the additional topics that we discussed for the tutorials:

Intermediate course: students already understand tables and are learning to write SQL codes
Data Quality Phenotyping/Cohort Definitions
Patient-level prediction
Population-level estimation
The OHDSI process - how to lead/run an OHDSI study from start to finish
OHDSI tools - an overview of the OHDSI tool stack

If you have a moment, please reply and rank these in the order of preference. From there, @ekatzman @MauraBeaton will compare to all other responses to see which ones are of the most interest to incorporate into the symposium.

krfeeney · April 18, 2018, 2:37pm

For those who want to see the email dialogue…

Feedback from @Christophe_Lambert:

Hi All,

I struggled with clarity on the difference between some of these courses, so I’ve sent a ranking with annotations requesting clarity:

Here are my rankings from most to least, skewed by my lack of clarity on the proposed course content:
· Population-level estimation
· The OHDSI process - how to lead/run an OHDSI study from start to finish [How is this different from Population-level estimation?]
· Phenotyping/Cohort Definitions [I think a simplified version of this could be folded into a full-day population level estimation course]
· OHDSI tools - an overview of the OHDSI tool stack
· Patient-level prediction
· Data Quality [What specifically about data quality?]
· Intermediate course: students already understand tables and are learning to write SQL codes [I really don’t know what this is supposed to be]

Feedback from @Patrick_Ryan:

Hi Christophe and team:

I agree that there’s some overlap that would be helpful to clarify. I’ll provide my two cents on how I could imagine us delineating various tutorial options and how I would prioritize them.

Introduction to OMOP Common Data Model and Standardized Vocabularies: @Christian_Reich / @mvanzandt / @ericaVoss / others have done a good job with this tutorial in the past, which is aimed at folks who are new to OHDSI and interested in learning more about the OMOP Common Data Model, particularly for the purposes of ETLing some patient-level dataset that they have into our community standards. I definitely agree that this course is an essential one to offer as one of our beginner courses, and I’m open to the idea previously proposed that this course may be offered twice if we get sufficient interest from the community. The only additional caveat that I will provide here is that the course, as delivered at last year’s symposium and this year’s Europe symposium, is geared primarily to a technical audience, those who actively touch data, are comfortable with installing/implementing technology, and like the idea of writing/executing SQL. While others have taken (and benefitted from) the course, it is probably not the same class we would deliver if the intended audience was purely non-technical, such as folks who lead departments who just want high-level awareness or researchers who want to design studies using OHDSI tools but aren’t going to do the implementation themselves.

An overview of the OHDSI tool stack: This was a new tutorial that we offered for the first time at the OHDSI Europe Symposium; Rijnbeek / jennareps / schuemie / I delivered the course, and I might de-prioritize this tutorial if we get negative feedback from the participants on the survey form, but at least based on the feedback directly after the course, it seemed very well received and the students were very engaged throughout. The tutorial video has been posted here for those who are interested: http://ohdsi-europe.org/index.php/symposium. Our description of the tutorial was: “This tutorial will in detail discuss the tool ecosystem of OHDSI with a focus on the impressive functionality in ATLAS, such as vocabulary browsing, cohort definitions, risk effect estimations, patient-level prediction, and more. Target Audience:Data holders, researchers, and regulators who want to learn more about the exciting tools developed by the OHDSI community.” Basically, we provided a high-level overview of various topics, doing 30-minutes sessions covering principles and ATLAS implementation for: voacabulary, ACHILLES data characterization, cohort definition, cohort characterization, incidence rate summary, population-level estimation, and patient-level prediction. Each of these topics warrants their own full-day course and this class definitely wasn’t sufficient in any topic, but for those who were looking to get the breadth of OHDSI, without the full depth, I think this class was effective. I could imagine this being the other ‘beginner’ course offered, and I think it works for technical and non-technical audience alike.

Patient-level prediction: The course that Rijnbeek / jennareps / jswerdel / l gave last year was REALLY good, and its definitely an area that I would like to see our community really grow. I think what Peter and Jenna have built with the PLP package is a game-changer for healthcare analytics, and what is needed now are more analysts making use of these tools to answer real questions in healthcare and applying machine learning solutions to real problems in clinical practice. If we could get Peter and Jenna to again offer this course, I expect there would be a lot of interest from the community, and I’m 100% confident that those who took the course would get a huge amount of value. From last year’s course: " Target Audience: Researchers who want to design prediction studies for precision medicine and disease interception using the OHDSI tools and programmers who want to implement and execute prediction studies using the OHDSI methods library. ".

Cohort definition/phenotyping: Nigam / jon_duke / Chris_Knoll / I offered a course at the OHDSI Symposium two years ago, and I think there’s been consensus that we need to continue to develop better approaches for designing and implementing phenotypes, both rule-based heuristics and also increasing our use of probabilistic phenotypes. There’s also a HUGE need to develop solutions for evaluating phenotypes, including strategies for estimating misclassification (sensitivity, specificity, positive predictive value). In addition to the necessary methodological research in this space, there’s also the practical reality that we need more people to be able to properly specify and execute basic cohort definitions, and there’s been a lot of requests for introductory instruction in how to use ATLAS for cohort definitions and conceptset construction so that basic phenotypes (like ‘new users of a drug’ or ‘persons newly diagnosed with a condition’) can be much more efficiently generated against OMOP CDMs. From the 2016 course, “Learning objectives and technical competencies:
• Learn principles for cohort definition and evaluation
• Develop rule-based heuristics in ATLAS
• Apply cohort definitions to analytical use cases of: disease phenotyping, exposure definition, and clinical trial feasibility
• Design predictive model-based phenotype evaluation using APHRODITE”
Last year, I tried to give a ‘cohort definition’ class at Erasmus for EMIF and it was awful, I did a horrible job and completely underestimated the time requirements and the baseline knowledge required for the students. Based on that negative experience, I completely redid my course materials, and Jon and I gave a course at the FDA toward the end of last year which went better, but still not great; we didn’t get through all the content we prepared, we rushed through other parts, and there were still a lot of outstanding questions that we didn’t fully address, even though the interactive exercises we prepared. It’s a hard topic to cover principles and implementation, because it really has pre-requisite knowledge about CDM and vocabulary AND you also need to know the contents of the patient-level data you are working with. I’m sure others could do a better job than me in trying to make this course work, but I’d say that whatever team we get to take this on needs to recognize that there’s a challenge in front of them, based on our past experiences. But even though its hard, its also very important and foundational to our ability to conduct all the types of analyses we want, including clinical characterization, population-level prediction, and patient-level prediction. So I think it’s important to continue to try to make this work in our community.

Population-level effect estimation: I think the course that schuemie / msuchard / Christophe_Lambert / I did last year is still a useful course, to cover the principles and OHDSI tools to support the design and implementation of comparative cohort studies for safety surveillance and comparative effectiveness. I put it lower on the list only because its the course we’ve delivered the most, though Jamie Weaver has offered to lead it this year, so if it makes our top list, we know we will have support for it. From last year’s Symposium: " Target Audience: Researchers who want to design estimation studies for safety surveillance and comparative effectiveness using the OHDSI tools and programmers who want to implement and execute estimation studies using the OHDSI methods library " . It still seems, at least in the epidemiology domain, that the use of a ‘propensity score-adjusted new user cohort design’ is still a prevailing approach being advocated by various parties, from what I see in the literature and ISPE and FDA/CDER Sentinel studies, etc, and it still seems like there is need to build capability within the OHDSI community, given huge number of questions that need answering and the relatively small number of answers that are getting published each year. At some point, I would like to see our community being to also expand the capability in designing and executing population-level effect estimation studies using other designs, including the self-controlled approaches that Marc and Martijn have developed. Perhaps at some point, Christophe would be interesting in further promoting his Local Control method, once the methods can be shown to computationally scale to the large p, large n problems we’ve been wrestling with as a community. In any case, I think the options on the table is to consider a) offering the same course previously provided, b) offer a new course around different population-level estimation methods, or c) revise the course to try to cover more content (either expand the methods represented, try to cover the principles and implementation of empirical calibration, and/or as Christophe suggested, try to cover more content around cohort definition)

Data quality: I’m not sure what others had in mind here, but I think it would be useful to develop a course on clinical characterization, which would cover topics like how to implement ACHILLES and interpret database-level summary statistics for greater understanding of source data and evaluation of the data quality checks that are represented in ACHILLES HEEL. I could imagine a course extending beyond database-level quality attributes to cover topics around ‘fitness-for-use’ and study feasibility, which could include how to review and interpret cohort characterizations to determine if a source has the appropriate elements with sufficient quality to allow for generating reliable evidence. If we are thinking quality more generally, then other topics that could fit would include methodological evaluation (e.g. how do you know that your method is actually giving you correct answers?) and phenotype evaluation and even the topics of meta-data/annotation that Jon and Ajit have been trying to lead within the community. I do think there’s a LOT more work that can and should be done in the area of developing clinical characterization solutions, many of which could be used for data quality purposes but wouldn’t be limited to that, and I could imagine that one opportunity within a tutorial for data quality would be to identify the specific gaps seen across the community and develop a strategy to address them moving forward.

OHDSI Process: how to lead/run an OHDSI study from start to finish. To Christophe’s question, I would imagine this is much broader than population-level effect estimation, because it could be any study of any type (including clinical characterization or patient-level prediction). I would anticipate what we would want to cover in such a course is how to design a study, write a protocol, elicit community review and participation, design and test analytical source code that implements the finalized protocol, and how to aggregate summary statistics as results from a network study. What’s a bit unclear is who the target audience for such a course would be, whether the non-technical researcher who has a good clinical question but wouldn’t implement it or the technical researcher who wants to implement good questions (theirs or others), or both. Assuming we want some level of technical competence, then I would say that one specific technical skill to build would be how to use SqlRender, so that you can implement queries that can run across the OHDSI network independent of the source environment. I’d probably also like to see the basics of how to design a R package that can execute an analysis and generate aggregate results, which can then be shared securely with a study coordinator. All that said, I don’t know who would be willing and able to develop such a course, and of course, whoever that would be would get to drive the direction.

Intermediate course: students already understand tables and are learning to write SQL codes. I’m not sure exactly what the thought was here this time around, but I remember last year, Kristin and I proposed a new tutorial that would have been ‘OHDSI 101 for analysts’, where we would have focused mainly on programmers who had access to an OMOP CDM who wanted to understand how to perform various basic analytic use cases via SQL, such as cohort definition and basic clinical characterization. I thought it was a good idea at the time, but was fine to be out-voted for other options. This year, I would favor the ‘OHDSI tool overview’ tutorial over the idea of teaching SQL, mainly because I think the OHDSI toolstack course is more accessible to a wider audience (including non-programmers), and it’s more aligned with the community efforts toward developing and applying shared solutions for standardized analytics than instructing on custom code for one-off analyses.
Cheers,
Patrick

Feedback from @Andrew:

Since I suggested it [What specifically about data quality?] here I my thoughts on that.

A presentation of pedagogical material to frame the interpretation and guide the methods of assessing and reporting on DQ
a. The harmonized data quality framework proposed by Kahn et al for warehouse-level.
b. The consensus recommendations for reporting on data quality for analytic datasets derived from data warehouses.

Tools that run against the CDM that map DQ results to those frameworks/recommendations
a. Hossein Estiri/Kari Stephens - DQe-c, DQe-v, and DQe-p
b. Rhitu Khare/PEDSnet - Data Quality Assessment Toolkit
c. Vojtech_Huser - Achilles Heel

I think the frameworks and associated tools are mature and useful and not as widely appreciated or used as they should be.

This would teach people a coherent and practical way to deal with a set of highly consequential and complex issues.

Andrew

Feedback from @ana_szarfman:

Hi all,

I think that we need bite the bullet and also explain our audience the current widespread problem of the lack of true interoperability and traceability from and to the data being originally generated.

• When working with non-interoperable data, every step downstream to the end users introduces data inconsistencies and data transformation errors.

• The current proposed solutions for implementing interoperability only addresses an unrealistic limited number of very simplistic use cases.

• For example, the ONC proposed solutions to address interoperability do not address the problems of the most vulnerable patients:

• No ICU cases
• No patients with multiple risk factors

• Too frequently, data are stored as static paper-like reports and transmitted by fax-like approaches.

• This problem spans Sponsors, Central Labs, Local Labs, Laboratory Information Systems (LISs), and the diversely different Electronic Health Records (EHRs) and hospital systems across the Nation.

• All this processing occurs without any concept of traceability to the originally terms generated by the lab instruments and other sources of information.

Feedback from @razzaghih:

Thanks for the great discussion, everybody. I’m enjoying reading all the comments and thoughts. Here are my thoughts and top picks, if it’s helpful:

• I very much agree with Andrew’s framework for discussing data quality. I really like how that’s laid out. I was recently at AMIA joint summits and I think the interest around data quality is increasing. There was a presentation on of the PCORnet studies that looked at approaches for evaluating data quality as specifically applied to a distributed data research network. It was all done within the context of the PCORnet abx study, but it generated a lot of discussion and I think people are curious about what that might look like. The abx paper will be published soon. The Kahn framework is very theoretical – I think it would be valuable to think about practical approaches as well (which Andrew also pointed out in #2). The data quality I think attracts a technical group as well as a traditional epidemiologist, so we’d have to be thoughtful about how to target, if we want to target. There’s so many ways to evaluate data quality using OMOP – both for warehousing as well as study specific data quality checks and methods.
• Cohort definitions is another really great topic. I think that Atlas provides a powerful tool for cohort definition and it would be a good addition to the tutorial. I also think it would be helpful for people to understand different approaches for phenotyping and what works and with what type of data. A mix of a theoretical approach and the practical, “How can OMOP facilitate phenotyping?” may be the right approach to take.
• Overview of the OHDSI tool stack: I love this topic. We’ve been an OHDSI shop here for years and I still don’t feel like we leverage all the tools correctly. I think the trick for this one will be to figure out whether we focus on the technical implementation/maintenance or whether it’s an overview of what the tools are, their function, and their usability.

Those are my top 3. I agree that asking the community may be helpful.

Thanks,
Hanieh

TengLiaw · April 19, 2018, 6:07am

My ranked list:

Data Quality Assessment and Management
Phenotyping/Cohort Definitions
Patient-level prediction
Population-level estimation
OHDSI tools - an overview of the OHDSI tool stack
The OHDSI process - how to lead/run an OHDSI study from start to finish
Intermediate course: students already understand tables and are learning to write SQL codes
Teng

Patrick_Ryan · April 19, 2018, 1:45pm

I agree that there’s some overlap that would be helpful to clarify. I’ll
provide my two cents on how I could imagine us delineating various tutorial
options and how I would prioritize them.

Introduction to OMOP Common Data Model and Standardized Vocabularies:
Christian/Mui/Erica/others have done a good job with this tutorial in the
past, which is aimed at folks who are new to OHDSI and interested in
learning more about the OMOP Common Data Model, particularly for the
purposes of ETLing some patient-level dataset that they have into our
community standards. I definitely agree that this course is an essential
one to offer as one of our beginner courses, and I’m open to the idea
previously proposed that this course may be offered twice if we get
sufficient interest from the community. The only additional caveat that I
will provide here is that the course, as delivered at last year’s symposium
and this year’s Europe symposium, is geared primarily to a technical
audience, those who actively touch data, are comfortable with
installing/implementing technology, and like the idea of writing/executing
SQL. While others have taken (and benefitted from) the course, it is
probably not the same class we would deliver if the intended audience was
purely non-technical, such as folks who lead departments who just want
high-level awareness or researchers who want to design studies using OHDSI
tools but aren’t going to do the implementation themselves.
An overview of the OHDSI tool stack: This was a new tutorial that we
offered for the first time at the OHDSI Europe Symposium;
Peter/Jenna/Martijn/I delivered the course, and I might de-prioritize this
tutorial if we get negative feedback from the participants on the survey
form, but at least based on the feedback directly after the course, it
seemed very well received and the students were very engaged throughout.
The tutorial video has been posted here for those who are interested:
http://ohdsi-europe.org/index.php/symposium. Our description of the
tutorial was: “This tutorial will in detail discuss the tool ecosystem of
OHDSI with a focus on the impressive functionality in ATLAS, such as
vocabulary browsing, cohort definitions, risk effect estimations,
patient-level prediction, and more. *Target Audience:*Data
holders, researchers, and regulators who want to learn more about the
exciting tools developed by the OHDSI community.” Basically, we provided a
high-level overview of various topics, doing 30-minutes sessions covering
principles and ATLAS implementation for: voacabulary, ACHILLES data
characterization, cohort definition, cohort characterization, incidence
rate summary, population-level estimation, and patient-level prediction.
Each of these topics warrants their own full-day course and this class
definitely wasn’t sufficient in any topic, but for those who were looking
to get the breadth of OHDSI, without the full depth, I think this class was
effective. I could imagine this being the other ‘beginner’ course
offered, and I think it works for technical and non-technical audience
alike.
Patient-level prediction: The course that Peter, Jenna ,and Joel gave
last year was REALLY good, and its definitely an area that I would like to
see our community really grow. I think what Peter and Jenna have built
with the PLP package is a game-changer for healthcare analytics, and what
is needed now are more analysts making use of these tools to answer real
questions in healthcare and applying machine learning solutions to real
problems in clinical practice. If we could get Peter and Jenna to again
offer this course, I expect there would be a lot of interest from the
community, and I’m 100% confident that those who took the course would get
a huge amount of value. From last year’s course: " Target Audience:
Researchers
who want to design prediction studies for precision medicine and disease
interception using the OHDSI tools and programmers who want to implement
and execute prediction studies using the OHDSI methods library. ".
Cohort definition/phenotyping: Nigam/Jon/Chris offered a course at the
OHDSI Symposium two years ago, and I think there’s been consensus that we
need to continue to develop better approaches for designing and
implementing phenotypes, both rule-based heuristics and also increasing our
use of probabilistic phenotypes. There’s also a HUGE need to develop
solutions for evaluating phenotypes, including strategies for estimating
misclassification (sensitivity, specificity, positive predictive value).
In addition to the necessary methodological research in this space, there’s
also the practical reality that we need more people to be able to properly
specify and execute basic cohort definitions, and there’s been a lot of
requests for introductory instruction in how to use ATLAS for cohort
definitions and conceptset construction so that basic phenotypes (like ‘new
users of a drug’ or ‘persons newly diagnosed with a condition’) can be
much more efficiently generated against OMOP CDMs. From the 2016
course, "Learning
objectives and technical competencies:

Learn principles for cohort definition and evaluation
Develop rule-based heuristics in ATLAS
Apply cohort definitions to analytical use cases of: disease
phenotyping, exposure definition, and clinical trial feasibility
Design predictive model-based phenotype evaluation using APHRODITE"

Last year, I tried to give a ‘cohort definition’ class at Erasmus for EMIF
and it was awful, I did a horrible job and completely underestimated the
time requirements and the baseline knowledge required for the students.
Based on that negative experience, I completely redid my course materials,
and Jon and I gave a course at the FDA toward the end of last year which
went better, but still not great; we didn’t get through all the content we
prepared, we rushed through other parts, and there were still a lot of
outstanding questions that we didn’t fully address, even through the
interactive exercises we prepared. It’s a hard topic to cover principles
and implementation, because it really has pre-requisite knowledge about CDM
and vocabulary AND you also need to know the contents of the patient-level
data you are working with. I’m sure others could do a better job than me
in trying to make this course work, but I’d say that whatever team we get
to take this on needs to recognize that there’s a challenge in front of
them, based on our past experiences. But even though its hard, its also
very important and foundational to our ability to conduct all the types of
analyses we want, including clinical characterization, population-level
prediction, and patient-level prediction. So I think it’s important to
continue to try to make this work in our community.

Population-level effect estimation: I think the course that
Martijn/Marc/Christophe/I did last year is still a useful course, to cover
the principles and OHDSI tools to support the design and implementation of
comparative cohort studies for safety surveillance and comparative
effectiveness. I put it lower on the list only because its the course
we’ve delivered the most, though Jamie Weaver has offered to lead it this
year, so if it makes our top list, we know we will have support for it.
From last year’s Symposium: " Target Audience: Researchers who want to
design estimation studies for safety surveillance and comparative
effectiveness using the OHDSI tools and programmers who want to implement
and execute estimation studies using the OHDSI methods library " . It
still seems, at least in the epidemiology domain, that the use of a
‘propensity score-adjusted new user cohort design’ is still a prevailing
approach being advocated by various parties, from what I see in the
literature and ISPE and FDA/CDER Sentinel studies, etc, and it still seems
like there is need to build capability within the OHDSI community, given
huge number of questions that need answering and the relatively small
number of answers that are getting published each year. At some point, I
would like to see our community being to also expand the capability in
designing and executing population-level effect estimation studies using
other designs, including the self-controlled approaches that Marc and
Martijn have developed. Perhaps at some point, Christophe would be
interesting in further promoting his Local Control method, once the methods
can be shown to computationally scale to the large p, large n problems
we’ve been wrestling with as a community. In any case, I think the
options on the table is to consider a) offering the same course previously
provided, b) offer a new course around different population-level
estimation methods, or c) revise the course to try to cover more content
(either expand the methods represented, try to cover the principles and
implementation of empirical calibration, and/or as Christophe suggested,
try to cover more content around cohort definition)
Data quality: I’m not sure what others had in mind here, but I think
it would be useful to develop a course on clinical characterization, which
would cover topics like how to implement ACHILLES and interpret
database-level summary statistics for greater understanding of source data
and evaluation of the data quality checks that are represented in ACHILLES
HEEL. I could imagine a course extending beyond database-level quality
attributes to cover topics around ‘fitness-for-use’ and study feasibility,
which could include how to review and interpret cohort characterizations to
determine if a source has the appropriate elements with sufficient quality
to allow for generating reliable evidence. If we are thinking quality more
generally, then other topics that could fit would include methodological
evaluation (e.g. how do you know that your method is actually giving you
correct answers?) and phenotype evaluation and even the topics of
meta-data/annotation that Jon and Ajit have been trying to lead within the
community. I do think there’s a LOT more work that can and should be done
in the area of developing clinical characterization solutions, many of
which could be used for data quality purposes but wouldn’t be limited to
that, and I could imagine that one opportunity within a tutorial for data
quality would be to identify the specific gaps seen across the community
and develop a strategy to address them moving forward.
OHDSI Process: how to lead/run an OHDSI study from start to finish.
To Christophe’s question, I would imagine this is much broader than
population-level effect estimation, because it could be any study of any
type (including clinical characterization or patient-level prediction). I
would anticipate what we would want to cover in such a course is how to
design a study, write a protocol, elicit community review and
participation, design and test analytical source code that implements the
finalized protocol, and how to aggregate summary statistics as results from
a network study. What’s a bit unclear is who the target audience for such
a course would be, whether the non-technical researcher who has a good
clinical question but wouldn’t implement it or the technical researcher who
wants to implement good questions (theirs or others), or both. Assuming
we want some level of technical competence, then I would say that one
specific technical skill to build would be how to use SqlRender, so that
you can implement queries that can run across the OHDSI network independent
of the source environment. I’d probably also like to see the basics of how
to design a R package that can execute an analysis and generate aggregate
results, which can then be shared securely with a study coordinator. All
that said, I don’t know who would be willing and able to develop such a
course, and of course, whoever that would be would get to drive the
direction.
Intermediate course: students already understand tables and are
learning to write SQL codes. I’m not sure exactly what the thought was
here this time around, but I remember last year, Kristin and I proposed a
new tutorial that would have been ‘OHDSI 101 for analysts’, where we would
have focused mainly on programmers who had access to an OMOP CDM who wanted
to understand how to perform various basic analytic use cases via SQL, such
as cohort definition and basic clinical characterization. I thought it was
a good idea at the time, but was fine to be out-voted for other options.
This year, I would favor the ‘OHDSI tool overview’ tutorial over the idea
of teaching SQL, mainly because I think the OHDSI toolstack course is more
accessible to a wider audience (including non-programmers), and it’s more
aligned with the community efforts toward developing and applying shared
solutions for standardized analytics than instructing on custom code for
bespoke studies.

clairblacketer · April 19, 2018, 3:11pm

To add on here:

I agree that the OMOP Common Data Model and Standardized Vocabulary course could potentially be offered twice, perhaps one focused more for technical folks wanting to touch the data using SQL and the other focused more on how the to use the OHDSI tool stack in conjunction with the learning objectives of the course
I would also advocate for some sort of ETL either course or help desk. While @mvanzandt and @ericaVoss have taught the vocab course more frequently and may be able to speak more to this, last year at each break I was getting many questions about specific challenges folks were having transforming their data. Understandably an 8-hour course may not be the best approach to solve this problem since everyone’s data is slightly different but I still believe it at least could be helpful to go through the process we typically use to approach a new ETL as well as the rules we follow as a community (THEMIS). What about a workshop where we teach the steps but participants apply them to their own data as faculty wait in the wings to answer questions?

krfeeney · April 20, 2018, 12:18pm

@clairblacketer I really like the idea of an ETL Help Desk! Maybe instead of a formal tutorial time, we could have some “office hours” set up around the US Symposium with some of our go-to experts for ETL where people can schedule a quick time to “consult the experts” on challenges they’re facing.

cukarthik · April 20, 2018, 12:30pm

I really like that idea as well. This is a good way to disseminate guidelines that the Themis group are developing. It might be good to include data quality (Achilles) here if you think of it as part of the data pipeline.

ifilgood · April 25, 2018, 3:50pm

These are all great topics and I am having hard time selecting 4 (given that 2 slots are already reserved for CDM and vocabulary)! My ranked vote is as follows:

Overview of the OHDSI tool stack
Patient-level prediction
Cohort definition/phenotyping
Data quality