OHDSI Home | Forums | Wiki | Github

Data Scope of OHDSI

Happy new year to you all!

I’m curious to know whether there is tabulated information on all of the medical conditions and HOIs currently covered in the Data Partners comprising the OMOP data set? I’m trying to compare the coverage of conditions of OHDSI/OMOP against those of other federal, public, and commercial data sources.

Thank you,
Lili

Happy New Year @lilipeng. Just to clarify there is no ‘OMOP data set’.
OHDSI develops and maintains an open community data standard (the OMOP
Common Data Model) and a host of open-source analysis tools (like
WhiteRabbit, ACHILLES, ATLAS, ARACHNE, and the R methods library for
population-level effect estimation and patient-level prediction) to support
researchers in using their observational data to generate evidence, and
acts as an open science community network whereby researchers can
collaborate on clinical questions of shared interest. Most, if not all, of
the federal, public, and commercial data sources that are available can be
transformed into the OMOP common data model and use the OHDSI tools, and in
fact, many organizations have already done this work (see the data network
here: http://www.ohdsi.org/web/wiki/doku.php?id=resources:data_network ).
The list of medical concepts covered in any OHDSI partner’s data is
tabulated in the ACHILLES tool, but this information has not been
aggregated across the community.

Thanks @Patrick_Ryan. I’d typed a bit prematurely about the “OMOP data set” and am aware of the federal, private, and academic institutions have transformed their respective datasets using the OMOP CDM v4 (or v5).

To provide you the context of my question, if I’m a scientist and want to use real-world evidence from OHDSI partners’ datasets, which (from the list you’d directed me to) is sourced primarily from EHR and claims data, along with other real-world data sources, such as patient surveys (Patients Like Me), mobile device data, and social networking data (Twitter), I would initially take a step back and assess whether all these heterogenous data sources contain info on the disease area I’m studying. Accordingly, my question would be, “Out of all the RWD sources out there publicly available, which ones provide data on incidence and prevalence of patients with ulcerative colitis ?” From a regulatory perspective, a scientist from the FDA could ask a “Out of all the RWD sources available publicly, which ones provide data on opioid use across the US?” If I were an RWE scientist, before conducting any analysis, I would cast a broad search for all data sources possible relevant to the clinical or regulatory question at hand.

[edit 1]: In September 2017, the Duke-Margolis Health Policy held a conference described a ‘fit-for-purpose’ approach for regulatory use of RWE:

Currently, Patients Like Me has tabulated a list of all medical conditions for which they have data on: https://www.patientslikeme.com/conditions I wonder if OHDSI can provide the same.

Lili

Hello, @lilipeng,
If i was to imagine an ideal world, we’d have our OHDSI data partners provide an ACHILLES dataset that is publicly available so that questions like the ones you have could be answered. For example, there is a sample dataset available on the ohdsi website here: http://www.ohdsi.org/web/achilles/#/OHDSI_Sample_Database/dashboard
or you can review conditions that exist in the data using the condition reports:
http://www.ohdsi.org/web/achilles/#/OHDSI_Sample_Database/conditions

However, there isn’t a central repository for all the different data partners to provide their Achilles data, so another option is to propose a ‘descriptive study’ to the community which defines the specific characteristics you’d like to learn about a given data source. I call this ‘descriptive’ because the intent of the study isn’t to determine comparative effectiveness or understand a relative risk; the intent of the study is just to get descriptive statistics on the entire population as a whole. Examples of statistics that you might want to know (and these are provided by Achilles):

  • Age at First Observation
  • Number of people with a diagnosis (for all diagnoses)
  • Number of people exposed to a drug (for all drugs)
  • Number of people with an observation period for a given year (for all years)
  • Distributions of observation period durations, lenght of exposures, age at first observation,

The list can go on. But by defining these statistics in a descriptive study, you can request people to run the descriptive statistics against their own data and provide you with the results. Then you can have an understanding about the ‘fitness of use’ for a particular study and possibly decide who you want to collaborate to answer a specific medical question.

1 Like

Hi @Chris_Knoll, thanks for the descriptive (no pun intended) response. The idea of conducting a descriptive study to obtain the aforementioned statistics using Achilles is a good one. if I were to do so, shall I reach out to the OHDSI community by posting another message or proposing the idea to on an OHDSI community call? I presume such results can be then published online?

Thanks again,
Lili

Hi, @lilipeng,

You could start a new forum post in the researchers area to solicit interest in the topic. Ultimately, what is needed is a protocol document that describes in detail how the statistics should be calculated and the intent for using the statistics. @msuchard, @schuemie, @Patrick_Ryan could guide you on how to develop this protocol and how to introduce it to the community, but I’m sure there are others who have first hand experience about getting study proposals brought forward to the community. Certainly discussing it on the OHDSI community call is an excellent way to get visibility.

-Chris

1 Like

Hi @Chris_Knoll,

Thanks for the recommendation on the next steps moving forward. I would definitely be interested in a authoring a protocol.

Before I move forward, I am curious on what the time commitment (per week) to be like for such a descriptive study? I would certainly like to work on it, but my current job may pick up a pace in the next month or so (I’m in consulting). Do you think the statistical analysis will be time-intensive?

Still, I would like to help out in any way, even on the periphery. For example, I can conjure up a list of disease areas of high interest to the OHDSI community, including those with few therapies or low recruitment rates for clinical trials, thereby increasing the reliance of evidence from data sources besides RCTs.

It depends how complicated the statistics are. If it was for simple counts or top 100 conditions (by prevalence) it would be straightforward to define, and simple to code. There may be existing queries (@Vojtech_Huser may have some examples) that may suit your needs that you could request dataholders to execute for you (and there also may be language describing the queries for use in a protocol). If it’s more complicated, and it requires more refined rules, then it might take more time to specify the logic around calculating the statistic and coding the queries. I can’t really give you a number of hours per week, but I’d recommend budgeting about 5-10 hours just to get an understanding of what is out there, reviewing some sample protocols, and gauging community interest. After you got that, you should have a better idea about how much time commitment you will need.

This sounds like you are describing a set of baseline characteristics you’d like to know about for a given data set aka. Table 1. This is exactly what you’d put into the protocol, and then produce the CDM queries that produce the results. I’ve produced these sort of reports for sub-populations in a dataset (such as people suffering a disease or people taking treatments) so the baseline is defined to a moment in time. But, this can be done at a database-level as well (which is what Achilles does).

Thanks @Chris_Knoll for the explanation. Right now it’s hard for me to predict what my workload will be like in the coming months, so I can’t provide a definitive commitment. However, what I can do is think about the disease areas I’d want to define baseline characteristics, and then move forward with that. Depending on community interest, other people could get involved and help out. (If work gets too busy for me, I can request for someone to take the lead, too.) I’ll think about which disease areas and drugs would be of high interest and then post a separate topic on the OHDSI forum.

Thanks again for your help!

OHDSI Network Study.docx (534.1 KB) Dear @Vojtech_Huser @Chris_Knoll @Patrick_Ryan @msuchard @schuemie
I’ve attached a rough draft of the protocol. I have the ‘big picture’ idea in place, but would appreciate further advice on the technical methods. Please note that this is just a draft and once it’s finalized I can post it along with an announcement of the study in a new thread. For now, I welcome your guidance on developing this protocol.

Thank you very much,
Lili

@lilipeng,
You can also see our Achilles (hospital EMR, Sample data of Korean national claim ) for this purpose.

1 Like

Actually, I just recalled that LAERTES houses adverse events data on drug-outcome relationships. In the protocol I’d specified ACHILLES.

Thanks @SCYou! I’ll take a look.

Hi @lilipeng! You may find this discussion we had in the past on a similar topic interesting to read.

I read your draft protocol. I definitely think you’re heading in the right direction. I’m not entirely sure what statistics exactly you want to capture. You focus on drug-outcome pairs (“for each unique condition…number of people exposed to a drug”). That will create a matrix with a lot of cells with count 1, and we usually don’t share patient-level information. Would it be ok to just capture information on the number of people with a condition (for all condition concepts), and the number of people with a drug (for all drug concepts) separately?

Personally, I wouldn’t use the ACHILLES summary statistics as the input for a study as your protocol currently suggests. I don’t like the extra dependency, and the numbers are just as easily computed directly from the CDM. But I’m sure others will disagree :smile:

Methods would be SQL code that would gather the relevant data and create an export.zip extract. (the usual OHDSI study).

I agree that using a minCellCount threshold (e.g., >=11) would be good.

(lilustrative example just for single drug)
For goal 6 - e.g., conditions I see after first exposure to bevacizumab - the SQL code would be different from Achilles current analyses.

Considering all eras (not just the first) for a patient for bevacizumab would be more complex. Would you consider only time window of x days/years after index event? Would this window be the same for all possible drugs?

Currently Achilles does not do this and I agree that this is very interesting.
I think J&J HOMER tool (based on their software demo in fall 2017) has many of the things already implemented. @Patrick_Ryan ?

@schuemie @Vojtech_Huser I’d initially included the statistics based on the recommendations by Chris Knoll. But it sounds like there is a simpler way to approach it. I’ve revised the methods to what it is now.OHDSI Network Study.docx (534.0 KB)

Research Methods:
• For the clinical context:

  1. Acquire a list of all unique condition concepts
  2. For each unique condition concept, run statistics on the number of people (prevalence and incidence?) with the condition concept
  3. Rank in order the condition concepts with the most to least OHDSI data
  4. Take note of gaps in OHDSI datasets for condition concepts with high clinical trial failure rates (i.e. various oncological diseases) and rare diseases with few or no cures. (These are gaps that OHDSI could fill by recruiting a potential Data Partner providing the respective data that OHDSI could perform ETL on.)

• For the regulatory context:
5. Acquire a list of all unique drug concepts
6. For each unique drug concept, run statistics on the number of people (prevalence and incidence?) with the drug concept
:clubs: Test case: h1n1
7. Rank in order the drug concepts with the most to least OHDSI data
8. Take note of gaps in OHDSI datasets for drugs that have received critical societal attention. (These are gaps that OHDSI could fulfill by active procuring data for ETL conversion.)

t