OHDSI Home | Forums | Wiki | Github

Patient-Level Data export from ATLAS Cohort Definitions

Dear all,

my team and I were interested in exporting patient-level data from ATLAS cohorts definitions.
We have searched for something similar in the Forum, but as we could not find anything that met our needs, we started developing a Shiny app for this purpose.

Is there anyone else working on something similar?

If you are interested in the topic, feel free to contact me. I would be very happy to discuss this with all of you :grinning:



What is your purpose or use case? Reason I am asking is twofold:

  • ATLAS, like all analytical use cases, is explicitly not attempting to support individual patient care. Instead, it attempts to find insights in cohorts or populations, or compare cohorts against each other to find insights from that.
  • “Exporting” patient-level data into analytical datasets is traditionally done to feed them into statistical methods. In other words, the data come to the analytics. The OMOP CDM does the opposite, it makes it possible for the analytics to come to the data. It does it so by providing a generic way of defining the cohorts/populations and to derive all statistics directly from the database. All output is strictly aggregate summaries.

Both these are very powerful, because they allow building the OHDSI network of data assets behind the firewalls, largely avoiding impediments to research from patient protection issues. Analytics are sent in and results are returned.

Why do you want to go back to the old ways? :slight_smile:

I’m not sure I would ask OHDSI for this capability (yet), but I can describe the reasoning at my institution. The Informatics dept. is the honest broker for clinical data for research and we fulfill hundreds of research data requests per year. Our data requesters have a wide range of analytics skills, but for many of them it doesn’t extend much beyond Excel. Even if they are slightly more sophisticated they still want flat files of denormalized data. Only a minority of them have the skills to write SQL and R against the CDM. In the future, when the oncology model in OMOP is mature and especially when federated research is possible in cancer, we will have a carrot to entice them to acquire those skills and bring their analytics to the data but at the moment they do not have that incentive.

We have data navigators who help requesters understand the available data and construct cohorts. I imagine as we bring OMOP online those navigators will use ATLAS for that purpose (they currently use i2b2) but the requesters will still want a data extract that is as flat as possible containing only the “columns” of interest. I imagine it will take some time to entice many of them to the CDM.

1 Like

Our organization is in a very similar situation. For many years, we have provided our researchers with denormalized data in flat files using a home-grown data model that started with HL7 and now uses EMR extracts (Epic Clarity) that we heavily post-process. Even as we encourage our customers to shift to OMOP CDM for the data model, that’s not going to change the way they do their work—certainly not overnight.

As an academic medical center, many of our users are clinicians, so they prefer a view of patient data similar to what they see in the EMR, and many of the data sets are small enough that manual review of every patient record to refine the inclusion criteria is entirely feasible. This is what our users are accustomed to and we plan to support it for the CDM.

In addition to HIPAA and institutional “minimum necessary” guidelines, our users want to work with subsets of the data that are filtered to include only the items of interest to their analysis, so we are planning to adapt our tools to work with the CDM while continuing to offer selective redaction of PHI, omission of unwanted tables, and row-level filtering based on user-selectable criteria.

While we do expect some of our users to define cohorts in ATLAS and then use our data export tools, we don’t expect ATLAS alone to meet the current needs of our users.

@jmethot @jpallas that’s exactly the point.
My organization works mainly with clinicians and often one of the first things we are asked when we explain how ATLAS and the other tools work is the possibility of extracting patient-level data after defining a cohort.
As you said, many of them don’t have the technical skills to write SQL and R against CDM, so our purpose is to fulfill this request and meet their needs.

Colorado University, Anschutz is currently in the process of implementing Leaf, an open source data explorer, cohort builder & extraction tool. We are using the OMOP CDM, sourced from Epic Caboodle, as our backend database.

We are in a similar position to @vramella, @jmethot & @jpallas as a honest broker delivering hundreds of datasets to researchers for use with their own analytical methods/tools.

We had tried using TriNetX for this purpose, but it’s a black box for mapping and we could never retrieve the same number of Persons as the TriNetX output for a cohort. We do have an Atlas instance for our more OHDSI savvy researchers, but it is a much more advanced tool and doesn’t allow row level export. Leaf seems more user friendly for our local researchers. See this paper by Nicolas Dobbins, et al.

If people are interested, I think a presentation/discussion about cohort building, data export, HIPAA and the tools used to enable researchers to build cohorts & extract data might be a well received topic for the EHR WG* to discuss next year. It’s not OHDSI/OMOP specific, but it does support healthcare research.

*EHR WG really is a misnomer. I think Healthcare System Discussion Group is more appropriate. EHR isn’t broad enough and we don’t assign work.

while ATLAS itself does not directly support exporting patient level data (by design), it is actually very easy to write an export script that would use cohort data generated in ATLAS in the results schema to export a slice of OMOP data for that cohort. This script could be easily parametrized with feeding it a cohort ID


Hi @MPhilofsky! I was wondering what the status of this is; has this topic indeed been discussed & has anything concrete come out of the discussion? Or else am I still in time to participate to the discussion?

We at IKNL (Dutch cancer registry) are currently seeing how to run a pre-existing package developed by one of our PhD students on our OMOPped data (and South Korean OMOPped data), and even if for future studies we will strive to create packages working directly against the OMOP structure, we figured that in this case it would make more sense to fill the gap by writing a script that extracts the needed info from the OMOP tables into a dataframe format (the input of the package).


Hello @Chiara,

We haven’t discussed this topic. I never put it on the Healthcare Systems Interest Group (formerly the EHR WG) agenda since no one showed interest. It would be beneficial if we could find a few people/institutions exporting row level data from the CDM. Then we could share our different processes. Do you know others who would like to participate?

1 Like

Hello @MPhilofsky ,
That would be great, I would love to partecipate.
As I said in my initial post, we are developing a shiny app to export row level data from the CDM. This app is not fully ready yet, but we have implemented the first features in a first demo app.

It would be great to discuss these different processes together.



I would like to participate. Dana-Farber doesn’t yet have an OMOP database but we are working toward it and I can provide use cases for row-level patient data export.


Hi @MPhilofsky, I do not know others who would like to participate but it looks to me there are already some organizations eager to get the discussion started :slight_smile: It would be great if we could indeed find a time to share our different use cases and processes!


Hi @MPhilofsky I would like to participate in building such a solution.
We at the Israel Ministry of Health are facing the same request from our physicians and researchers, we already started to develop a solution for extracting cohort dataset and transferring it to a patient-level flat file, and will be happy to share our solution and learn from @vramella and others.



It looks like we have enough folks interested to hold a productive meeting! I will reach out individually to get something scheduled.


1 Like

We at Charité would be interested in this as well!


Allow me to issue my standard @Christian_Reich nag here:

Extracting analytical datasets from an OMOP database is - (thinking of a non-offensive word here) - counterproductive:

  • You kill federated network research
  • You are not using standardized methods, but some local spaghetti
  • You are creating patient data protection issues
  • You are duplicating data unnecessarily

Just because something has been done for a long time it doesn’t mean it is a good idea. It is a bad idea.

End of rant.

If you have to do it because you can’t say no to your internal customers:

  • Create a cohort in ATLAS
  • Join the COHORT table to the clinical OMOP tables you are interested in on person_id and time stamp
  • Do a gigantic pivoting to have each concept_id as a column (maybe filtered through some shortlist), and each person_id as a row
  • Don’t tell anybody you are doing it! :slight_smile:

Your can use this function to get what you want

But - like @Christian_Reich says, don’t do it

Dear all,

We are interested in this as well. Does any of you a working solution for this? We would like to export patient level data form the cohort but I don’t think that everyone should develop an own solution. @vramella: does your shiny app work?

No one is interested in this topic?

I understand the philosophy behind federated research and so but there are use-cases where we need to extract the patient level data and use it ther softwares. I just tell you one: we have a large national ehealth database containing financial data about a patients (prescriptions, diagnosis), but not containing lab test results or other clinical data. We cannot download this financial database because of data protection reasons but if we have an own database, we can connect them togther and analyse it in a secure “environment” with approved softwares. So, if I have cohort and I want to analyse the LDL levels form my HIS and drug usage together I need to send the patient level data to them, they connect it with the prescription database and I can analyse it. That’s why we need patient level data sometimes. What do you think about this?

Taking a small random sample of person’s who are in cohort is a legitimate use case. In this use case, we look at a random sample of say 20 person’s to see if their signs, symptoms, treatments, investigations are in line with what is expected for person’s with that phenotype.

If however your use case is to just extract person level data from one data source to reuse as part of another repository that’s frowned upon. You may have a legitimate reason to do this, like what you described (sounds like data enrichment). If yes, you should probably use ETL tools to do that, not analysis tools like Atlas or ohdsi R packages.