OHDSI Home | Forums | Wiki | Github

ICU data, anyone?

Hi Ben,

There are two outstanding sources of ICU data publicly available:


Which has public code to transform to OMOP-CDM

And a newly released Chinese database

Which follows the MIMIC schema thus the ETL should in theory work

1 Like

Hi Jose,

Thanks for the link. I’m aware of MIMIC, and it is indeed a great resource. I was curious, however, to understand if there are OMOP’ed ICU data sets which

  1. cannot be shared publicly (as ours) and, thus, would be well-suited for the local-analysis-central-synthesis (privacy-by-design) philosophy of OHDSI; and/or
  2. come from other hospitals, regions and countries than MIMIC (e.g., European or Asian), for richer external validation.

I’m not super familiar with the nature of claims data, but I reckon ICU data are a different creature compared to the “typical OMOP data”. I’d imagine—and hope—that e.g. the EHDEN project might help OMOP’ed ICU to come about.

If others stumble into this post and have OMOP’ed ICU data, I’d been keen to get in touch.


How do you define “ICU data sets”? Many hospital EHR data sources contain ICU data. What exactly do you seek from the ICU dataset?

Right. Good question, @MPhilofsky. I suppose I’d define ICU data as comprising high-frequency data from machines (e.g. ventilators, infusion pumps and continuous monitoring apparatus) and more “low-frequency” data (e.g. manual observations and biochemistry) than other hospital wards.

Thus, in our context “ICU data” are considered somewhat distinct from EHR data, and since we’re trying to leverage the additional information of the data types mentioned above, we were keen to understand if others in the OHDSI community are as well.

You should ask about this in the Device WG.

Colorado doesn’t record data directly from many devices. Some are interfaced to the EHR, but most data are input or approved for input into the UI by the bedside Provider. For devices that continuously monitor (EKG, HR, arterial BP), I wonder how often this data is recorded into an EHR. Every second, every 10 seconds, every minute, every 5 minutes, etc…

Hi - this is a good question for the Device Working Group.

At Tufts we are currently capturing 12-lead ECG data (10 seconds at 500sps) and mapped to an ECG_OMOP database instance. We are starting to explore and develop ways to manage continuous data from patient monitors in collaboration with folks in the MIMIC and commercial vendors. While we have developed OMOP mapping for ECG measurements and computerized interpretation statements for the more common 12-lead ECG machines - I expect the support of continuous patient monitoring data to be derived from the current work/mapping dictionary.

Regarding frequency of data capture - it depends on your device integration solution/method and ultimately what is the clinical or informatics needs. In general I prefer to capture diagnostic resolution data and take the storage hit, maybe not directly in the EHR but as part of our research data warehouse. You would have to run a literature search to see what seem to be a useful data resolution and whether continuous or episodic sampling will be sufficient.

Hi @mkwong,

Did you developed an extension of OMOP for ECG data? if yes, do you have any resource you could point me?

Thanks in advance!

Hi - no extensions to the CDM design at all - used LOINC and SNOMED standard concepts and mapped all 12-lead ECG measurements (LOINC) and computerized interpretation statements (SNOMED and LOINC) to OMOP. For patient monitors - I anticipate doing the same thing for its data. For practical implementations - I am leaning toward a) Using OMOP.note table to reference file sets that represent and contain the full continuous data stream; or b) Embed blocks of patient monitor data in a series of OMOP.note records.

Staging is handled by a 1-N lead XML schema to handle different commercial native data source/format/sample rates/signal resolutions.

If you are using GE12SL, Philips, or GLASGOW 12-lead ECG measurements/interpretation statements I can share these mappings with you.

Hi Manlik,

We are in early phases with waveform data so anything you can share will be more than welcome. My email is jdposada at stanford.edu. Thanks!

Hi guys,

Thanks for your responses. I agree with your notion, @mkwong, that it’s better to capture all data and afterwards wind ways to operationalise it in a way useful for the analytic problem at hand.

It does seem from the response rate (and the nature of your responses) that essentially no sites have plugged “ICU data” into their OMOP CDM’s, but I’ll ask in the Device WG as well.

A lot of the data coming out of ICU machines has exact low-frequency equivalents: blood pressure, drug exposure (with continuous infusion), saturation. Mapping that should be quite straightforward, I’d argue. Other data types coming from e.g. ventilators might be more tricky because you also get settings data as well as observations.

Anyway, thanks a bunch! And if other people with ICU data stumble across this post and are interested in collaborating, please give me a shout.


Benjamin is right! Collaboration is exactly what we need. Let’s join efforts to finish the OMOP ETL of ICU data from MIMIC.
That will reach the broadest community doing similar work in this space and is furthest along toward completion. It will naturally add value to physionet community’s work by enabling linkage to broader data outside the ICE and begin to make analytics done on OMOPed data by the physionet members available as tools and methods the OHDSI community can use.

The physionet project for this ETL is at https://physionet.org/projects/1wW5mQkK2Cm10WuKav3z/overview/

Vojtech has revived an effort to finish the great work started by Alistair Johnson, Tom Pollard, Nicolas Paris, and Adrian Parrot. Manlik’s great work on mapping ECG waveforms to LOINC and SNOMED is an important example. The terrific Juan Banda is already on board.

Vojtech, do you want to organize this into an OHDSI WG in addition to a physionet project? Properly leveraging the network effects for this work could make a big difference in how quickly and how well it gets done.

1 Like


Just to clarify. I see that the ETL for MIMIC is here

Or is it not done, or we are talking about something else in addition to MIMIC here? Could someone please summarize it here?

Hi there,
at Odysseus we are currently investigating the existing github project as well and are trying our hands on the MIMIC demo dataset. We will be happy to share our findings.

1 Like

Mik That’s great. The more hands the better.

Greg. Here is my summary: The ETL is not done. A very rough estimate of remaining work according to Alistair Johnson at is ~6 mo 100% FTE. Having more people pitch in will have a big impact. My understanding is that the “numerics” data generated by monitors and other devices in the ICU are the biggest challenge for completing it. Numerics refers not to the raw output of these ICU devices, but to a chunked representation that summarizes the raw output over short intervals - the multiple seconds to minutes range.

Alistair, Tom Pollard, Leo Celi and others in Roger Marks’ wonderful Laboratory for Computational Physiology have deep knowledge of the requirements for representing these “high resolution” data to meet so they be used for signal processing and data science that has an impact on ICU care delivery and outcomes. Meeting those requirements will be a critical component of completing this work correctly.

They are great folks and in PhsyioNet have created a vibrant and productive community much like OHDSI. Plus their example in freely providing high quality real datasets for both data science and education has really blazed a very important trail that too few are following. Partnering with them will be great. I think this product, OMOPed MIMIC data, will be an important catalyst to link the two communities and expand the algorithm development and outcomes research in ways neither could community could do separately.

1 Like

There is a project in draft mode led by me on Physionet. See this github issue here https://github.com/MIT-LCP/mimic-omop/issues/52

@Vojtech_Huser, just put myself on the measurement table, but perhaps you could guide me a bit as to how I could help out. If not, I’ll figure it out myself :slight_smile:

1 Like

Yes, am interested and would like to participate as well. I have left my name in the google doc but would like to be guided on what has to be done. I understand we have to transform MIMIC III to OMOP form but I have few questions regarding timelines, what exactly has to be done and how are we as a team planning to do this and would like to get it answered before I can start.

1 Like

Use this thread for further discussion: Argos project: 2020 OMOPed MIMIC project

Hello Manlik. You mentioned having Loinc mappings to Ge and Phillips ECG interpretation phrases. I would be very interested in seeing them. As you did, I found that all the standard numeric measurements in the ECG are already covered, but was wondering about how to handle the impressions. Do you end up putting them in “Conditions” or “Observations” depending on the doamin of the CDM Standard Concept?

Thanks, Alan

Most of the mapping I did ended up in conditions as I wanted to stick with SNOMED codes. While the condition_occurrence table does not include qualifiers/modifiers - I put this information in the condition_source_value as well as using the status_concept_id. Using the status_concept_id may not be the best place for this information, but for now that is where I’m storing the information in a 5.2 CDM at the moment.

As you know native ECG interpretation statements commonly have the elements:

  • (Left hand side) Main finding statement
  • (Right hand side) Supporting reasons for the Main findings statement
  • 1 or more modifiers - acute, incomplete, abnormal, possible, probable, …

Which vendor are you interested in? You can send me an E-mail at mkwong@tuftsmedicalcenter.org and I can share with you the specific mapping.

  • MK