OHDSI Home | Forums | Wiki | Github

Patient-Level Data export from ATLAS Cohort Definitions

Dear all,

my team and I were interested in exporting patient-level data from ATLAS cohorts definitions.
We have searched for something similar in the Forum, but as we could not find anything that met our needs, we started developing a Shiny app for this purpose.

Is there anyone else working on something similar?

If you are interested in the topic, feel free to contact me. I would be very happy to discuss this with all of you :grinning:



What is your purpose or use case? Reason I am asking is twofold:

  • ATLAS, like all analytical use cases, is explicitly not attempting to support individual patient care. Instead, it attempts to find insights in cohorts or populations, or compare cohorts against each other to find insights from that.
  • “Exporting” patient-level data into analytical datasets is traditionally done to feed them into statistical methods. In other words, the data come to the analytics. The OMOP CDM does the opposite, it makes it possible for the analytics to come to the data. It does it so by providing a generic way of defining the cohorts/populations and to derive all statistics directly from the database. All output is strictly aggregate summaries.

Both these are very powerful, because they allow building the OHDSI network of data assets behind the firewalls, largely avoiding impediments to research from patient protection issues. Analytics are sent in and results are returned.

Why do you want to go back to the old ways? :slight_smile:

I’m not sure I would ask OHDSI for this capability (yet), but I can describe the reasoning at my institution. The Informatics dept. is the honest broker for clinical data for research and we fulfill hundreds of research data requests per year. Our data requesters have a wide range of analytics skills, but for many of them it doesn’t extend much beyond Excel. Even if they are slightly more sophisticated they still want flat files of denormalized data. Only a minority of them have the skills to write SQL and R against the CDM. In the future, when the oncology model in OMOP is mature and especially when federated research is possible in cancer, we will have a carrot to entice them to acquire those skills and bring their analytics to the data but at the moment they do not have that incentive.

We have data navigators who help requesters understand the available data and construct cohorts. I imagine as we bring OMOP online those navigators will use ATLAS for that purpose (they currently use i2b2) but the requesters will still want a data extract that is as flat as possible containing only the “columns” of interest. I imagine it will take some time to entice many of them to the CDM.

1 Like

Our organization is in a very similar situation. For many years, we have provided our researchers with denormalized data in flat files using a home-grown data model that started with HL7 and now uses EMR extracts (Epic Clarity) that we heavily post-process. Even as we encourage our customers to shift to OMOP CDM for the data model, that’s not going to change the way they do their work—certainly not overnight.

As an academic medical center, many of our users are clinicians, so they prefer a view of patient data similar to what they see in the EMR, and many of the data sets are small enough that manual review of every patient record to refine the inclusion criteria is entirely feasible. This is what our users are accustomed to and we plan to support it for the CDM.

In addition to HIPAA and institutional “minimum necessary” guidelines, our users want to work with subsets of the data that are filtered to include only the items of interest to their analysis, so we are planning to adapt our tools to work with the CDM while continuing to offer selective redaction of PHI, omission of unwanted tables, and row-level filtering based on user-selectable criteria.

While we do expect some of our users to define cohorts in ATLAS and then use our data export tools, we don’t expect ATLAS alone to meet the current needs of our users.

@jmethot @jpallas that’s exactly the point.
My organization works mainly with clinicians and often one of the first things we are asked when we explain how ATLAS and the other tools work is the possibility of extracting patient-level data after defining a cohort.
As you said, many of them don’t have the technical skills to write SQL and R against CDM, so our purpose is to fulfill this request and meet their needs.

Colorado University, Anschutz is currently in the process of implementing Leaf, an open source data explorer, cohort builder & extraction tool. We are using the OMOP CDM, sourced from Epic Caboodle, as our backend database.

We are in a similar position to @vramella, @jmethot & @jpallas as a honest broker delivering hundreds of datasets to researchers for use with their own analytical methods/tools.

We had tried using TriNetX for this purpose, but it’s a black box for mapping and we could never retrieve the same number of Persons as the TriNetX output for a cohort. We do have an Atlas instance for our more OHDSI savvy researchers, but it is a much more advanced tool and doesn’t allow row level export. Leaf seems more user friendly for our local researchers. See this paper by Nicolas Dobbins, et al.

If people are interested, I think a presentation/discussion about cohort building, data export, HIPAA and the tools used to enable researchers to build cohorts & extract data might be a well received topic for the EHR WG* to discuss next year. It’s not OHDSI/OMOP specific, but it does support healthcare research.

*EHR WG really is a misnomer. I think Healthcare System Discussion Group is more appropriate. EHR isn’t broad enough and we don’t assign work.

while ATLAS itself does not directly support exporting patient level data (by design), it is actually very easy to write an export script that would use cohort data generated in ATLAS in the results schema to export a slice of OMOP data for that cohort. This script could be easily parametrized with feeding it a cohort ID

1 Like

Hi @MPhilofsky! I was wondering what the status of this is; has this topic indeed been discussed & has anything concrete come out of the discussion? Or else am I still in time to participate to the discussion?

We at IKNL (Dutch cancer registry) are currently seeing how to run a pre-existing package developed by one of our PhD students on our OMOPped data (and South Korean OMOPped data), and even if for future studies we will strive to create packages working directly against the OMOP structure, we figured that in this case it would make more sense to fill the gap by writing a script that extracts the needed info from the OMOP tables into a dataframe format (the input of the package).