OHDSI Home | Forums | Wiki | Github

Research questions that the OHDSI community can potentially answer to suport the COVID-19 response

(Patrick Ryan) #1


To support our planning for the OHDSI virtual study-a-thon on COVID-19, we need to identify potential questions that we can answer using our OHDSI data network and analytic skills, so that we can prioritize these opportunities based on their feasibilty and the potential public health impact they represent. I’m opening this discussion thread to capture those questions. As you propose ideas, please answer the four questions below.

  • What is the decision we are trying to inform?
  • Who is the decision-maker?
  • What type of real-world data is needed to generate reliable evidence?
  • How will reliable real-world evidence inform the decision?
  • (we’ll presume we know the answer to ‘When is the evidence needed?’ = as soon as possible!)

Thanks in advance for your collaboration!

[OHDSI COVID-19 response] Community Update 20 March 2020
(Patrick Ryan) #2

(Daniel Prieto-Alhambra) #3

Even if as an ice breaker, these are research questions I have compiled from discussions with clinicians from all over Europe:
- Characterising who gets viral pneumonia
- Who are they?
- How do we treat them? steroids, antivirals, hcq, other therapies
- What’s the associated healthcare resource use (ICU, hosp admission, etc) and morbi-mortality?
- Stratified by age, gender, country, comorbidity, and virus (where known)

- Predicting who will get pneumonia out of seasonal flu sufferers; and predicting morbi-mortality amongst viral pneum sufferers
    - Does this transport to other viruses?

- PLEs
    - Does HCQ confer a lower risk of viral infection/s or of viral pneumonia?
    - Do jak inhibitors " “
    - Do antiretrovirals “ “
    - Do the drugs above reduce morbimortality amongst viral pneumonia sufferers?
    - Do systemic steroids reduce morbi-mortality amongst viral pneumonia sufferers?

(Seng Chan You) #4

I’d like to add several questions of mine, too.

Questions and required type of real-world data to generate reliable evidence

  • The spread pattern of this disease

    • Number of daily identified covid-19 cases with their location information from diverse countries.
  • Age/gender standardized case fatality rate

    • Number of fatal cases with their age/gender/date of diagnosis/date of death in each country
    • the number of total cases with their age/gender/date of diagnosis

I think the two data above are currently available. It would be possible to population simple CDM dataset with the currently available data for the covid-19 (patient / age / gender/ location / country / condition start date / date of death)

Surprisingly, we still do not know the exact age/gender-standardized mortality rate, even though it can be estimated with current available data around the world.

Another potential data source would be questionnaires, which health authorities of each country might have acquired. If we can standardize these information inside each country, we can perform distributed research using these data to figure out detailed characteristics of the covid-19.

If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle. – Sun Tzu, The Art of War

(Seng Chan You) #5

You can see the datasets summarizing released information for Korean covid-19 ( parksw3, jihoo-kim).
Can we aggregate more date from other countries, and ca we standardize these data?

(Ray Chen) #6

Wow very impressive data collection in South Korea. Here in the US, or specifically NY, I’m not aware of any datasets at that level of detail. All I’ve seen are tables for positive, negative, and pending tests (NYS and NYC), but probably because we’re still relatively behind on both the number of cases and our ability to test widely. Perhaps on the west coast where there are more cases and they’ve been facing this for longer, there might be more or better data

While working this past week, some questions have come up around the ever-changing guidelines and algorithms for testing, which may be hard to answer because it is more COVID specific than for viral illnesses in general.
–Given some of the limitations in our ability to test patients at scale (hopefully resolved by the study-a-thon but perhaps not), who should we test? Who is most likely to test positive? Of patients who come in with URI symptoms or pneumonia on CXR, which of these is caused by COVID?

–And related but more generalizable to other viral illnesses, of those who test positive, who is most likely to require hospitalization, supplemental oxygen, ICU stay, intubation, or die?
Basically, for patients with a viral illness (I might consider individual viruses–coronavirus, RSV, adeno, paraflu, flu, etc separately and then together), who is most likely to require those interventions or higher levels of care? Who is most likely to get a bacterial superinfection and/or pneumonia? [for resource utilization, level of care triage, admit or not. Existing pneumonia severity scores are also very old, can we build better risk prediction models?]

(Jenny Lane) #7

Within this report there are some basic epidemiological characteristics of the Chinese infections.We could compare this to other countries,and to other viral pneumonias? It may also shed light on how translatable any prediction models in a different viral pneumonia disease model may be?

(Selva) #8


Am not from clinical background, so just thinking from data and ML perspective

If we have access to covid data, can a question like “How many cases will develop in next n days?” be answered?


(Daniel Prieto-Alhambra) #9

Thanks @SELVA_MUTHU_KUMARAN . Learning on the ‘epidemiological curve’ of covid19 is indeed important but only doable once we have such specific cohort data. In the meantime we could focus on modelling of other viral outbreaks (seasonal flu, sars, etc) and then see if this ‘transports’ to the covid19 dataset once we have it in the coming months. This is to me a characterisation problem that we should add to our list of priorities. Would you agree @SCYou ?
Now, there might be some previous public health knowledge out there we can leverage. Maybe @Albert_Prats has something to bring to this table.

(Daniel Prieto-Alhambra) #10

Thanks @rchen . The testing problem is important but tricky for now as the denominator of who is tested changes a lot over time and geography. Worth bearing in mind worth and maybe preparing for when suitable data is available.
Your second question is very much in line with mine above “Predicting who will get pneumonia…”. I think this is one that keeps coming out as an important and relevant question to prioritise. let’s add to our list if not already there @SCYou !

(David Vizcaya) #11

I wonder if there are specific risk factors for mortality and disease progression among in infected “non-frail” population (aged below 60, not immunosupressed, etc). Also related to this population one could think of assessing burden of disease in terms of productivity loss associated with COVID19 once data is available (should this question consider exceptional preventive measures taken?)

(Daniel Prieto-Alhambra) #12

thanks David
so if I understand correctly your first question is a prediction model specific to ‘non frail’ people. this is I think interesting and should not be too complex to do once we have the code ready for the overall prediction. Am I right @Rijnbeek ?
as for the second, this is I think already part of our characterisation as set above. See we say ‘stratified by age, gender, comorbidity, …’


(Daniel Prieto-Alhambra) #13

Thanks @SCYou
somewhat related to this, I’d also be interested to learn on the secular trends in non-covid (eg cardiovascular or cancer-related) morbi-mortality over time. This outbreak will compete for resources (hospital beds, icu, ventilators etc) and could potentially ‘displace’ or ‘de-prioritise’ other serious conditions
I am thinking of some sort of interrupted time series (segmented linear regression or the likes) to test the impact of the outbreak on unrelated morbi-mortality in countries with (intervention) vs without (control) a high number of cases, and before vs after the outbreak
I am going to volunteer @Albert_Prats to think of this and how we could do it!

(Hainiwen) #14

Hi all!
Dataset of Korean covid-19 is very impressive!! I’m a Chinese hospital pharmacist (and a fan of OHDSI) who happen to contribute to the github project of Wuhan2020 https://github.com/wuhan2020/wuhan2020 for data collection. Volunteers in China were formatting all officially released information containing required info mentioned by Dr Seng Chan You, a bit messy (and huge) but I think I can contribute in formatting further so that the community can use?

(Seng Chan You) #15

The github project of Wuhan 2020 looks really great. Can we estimate age/gender-stratified mortality rate by using these data? @hainiwen

(Gregory Klebanov) #16

there are tons of publications and literature available on PubMed that maybe OHDSI can take advantage of

btw, reading through those, found this one

We have built a centralised repository of individual-level information on patients with laboratory-confirmed COVID-19 (in China, confirmed by detection of virus nucleic acid at the City and Provincial Centers for Disease Control and Prevention), including their travel history, location (highest resolution available and corresponding latitude and longitude), symptoms, and reported onset dates, as well as confirmation dates and basic demographics. Information is collated from a variety of sources, including official reports from WHO, Ministries of Health, and Chinese local, provincial, and national health authorities. If additional data are available from reliable online reports, they are included. Data are available openly and are updated on a regular basis (around twice a day).

maybe this can be useful?

(Christian Reich) #17

Not sure that makes a ton of sense, because the difference between frail and non-frail is not sudden. I would let the data tell us what matters, not pre-specify the result.

Also, we cannot see one important factor: The viral load. I am sure if you get infected with a ton of virusses at once your immune response will be more massive, which is what creates these fulminant pneumonias.

(Christophe Lambert) #18

Is there any patient level data source with lab tests over time (including viral load), treatments, and outcomes for a decent sized n? Thus far, I’ve only seen a few case reports with this info and only summary statistics for larger cohorts.


(Vojtech Huser) #19

Ad lab tests - I was looking at new loinc codes. Relevant discussion on LOINC forum - link is here: https://loinc.org/forums/topic/sars-cov-2-codes-discussion/

The goal is to somehow convey qualitative and quantitative test for SARS-CoV-2 or SARS-CoV-2 RNA.

(and maybe later at some point antibody test for SARS-CoV-2)

(daniel morales) #20

Like @Daniel_Prieto I think there is value in measuring the incidence of, and risk factors for viral pneumonia. I would consider extending the scope to include secondary bacterial pneumonia complications of flu also (e.g. Staphylococcal pneuonia that is flu-related, severe and leads to rapid clinical deterioration).

Given the available tools a cohort analysis assessing the risk of several COVID-19 related outcomes (death, viral pneumonia, bacterial pneumonia, ARDS) compared to non-covid19 flu cases could be performed (i.e. what is the excess risk from COVID-19 if any).

Happy to link in with @Albert_Prats from the secular trend approach if helpful.