ICU data, anyone?

benskov · February 28, 2020, 10:38am

Hi everybody,

We’re working with i.a. ICU data in our group. We currently didn’t OMOP our data—although it’s on the drawing board for later—but were wanted to reach out to the community to understand how many OMOP’ed ICU data set exist, and if anyone might be interested running a validation study eventually?

Cheers,
Ben

jposada · February 28, 2020, 9:29pm

Hi Ben,

There are two outstanding sources of ICU data publicly available:

Which has public code to transform to OMOP-CDM

And a newly released Chinese database

Which follows the MIMIC schema thus the ETL should in theory work

benskov · March 2, 2020, 1:09pm

Hi Jose,

Thanks for the link. I’m aware of MIMIC, and it is indeed a great resource. I was curious, however, to understand if there are OMOP’ed ICU data sets which

cannot be shared publicly (as ours) and, thus, would be well-suited for the local-analysis-central-synthesis (privacy-by-design) philosophy of OHDSI; and/or
come from other hospitals, regions and countries than MIMIC (e.g., European or Asian), for richer external validation.

I’m not super familiar with the nature of claims data, but I reckon ICU data are a different creature compared to the “typical OMOP data”. I’d imagine—and hope—that e.g. the EHDEN project might help OMOP’ed ICU to come about.

If others stumble into this post and have OMOP’ed ICU data, I’d been keen to get in touch.

Cheers,
Ben

MPhilofsky · March 3, 2020, 5:08pm

How do you define “ICU data sets”? Many hospital EHR data sources contain ICU data. What exactly do you seek from the ICU dataset?

benskov · March 4, 2020, 7:21am

Right. Good question, @MPhilofsky. I suppose I’d define ICU data as comprising high-frequency data from machines (e.g. ventilators, infusion pumps and continuous monitoring apparatus) and more “low-frequency” data (e.g. manual observations and biochemistry) than other hospital wards.

Thus, in our context “ICU data” are considered somewhat distinct from EHR data, and since we’re trying to leverage the additional information of the data types mentioned above, we were keen to understand if others in the OHDSI community are as well.

MPhilofsky · March 4, 2020, 7:33pm

You should ask about this in the Device WG.

Colorado doesn’t record data directly from many devices. Some are interfaced to the EHR, but most data are input or approved for input into the UI by the bedside Provider. For devices that continuously monitor (EKG, HR, arterial BP), I wonder how often this data is recorded into an EHR. Every second, every 10 seconds, every minute, every 5 minutes, etc…

mkwong · March 4, 2020, 11:53pm

Hi - this is a good question for the Device Working Group.

At Tufts we are currently capturing 12-lead ECG data (10 seconds at 500sps) and mapped to an ECG_OMOP database instance. We are starting to explore and develop ways to manage continuous data from patient monitors in collaboration with folks in the MIMIC and commercial vendors. While we have developed OMOP mapping for ECG measurements and computerized interpretation statements for the more common 12-lead ECG machines - I expect the support of continuous patient monitoring data to be derived from the current work/mapping dictionary.

Regarding frequency of data capture - it depends on your device integration solution/method and ultimately what is the clinical or informatics needs. In general I prefer to capture diagnostic resolution data and take the storage hit, maybe not directly in the EHR but as part of our research data warehouse. You would have to run a literature search to see what seem to be a useful data resolution and whether continuous or episodic sampling will be sufficient.

jposada · March 5, 2020, 1:06am

Hi @mkwong,

Did you developed an extension of OMOP for ECG data? if yes, do you have any resource you could point me?

Thanks in advance!

mkwong · March 5, 2020, 7:26am

Hi - no extensions to the CDM design at all - used LOINC and SNOMED standard concepts and mapped all 12-lead ECG measurements (LOINC) and computerized interpretation statements (SNOMED and LOINC) to OMOP. For patient monitors - I anticipate doing the same thing for its data. For practical implementations - I am leaning toward a) Using OMOP.note table to reference file sets that represent and contain the full continuous data stream; or b) Embed blocks of patient monitor data in a series of OMOP.note records.

Staging is handled by a 1-N lead XML schema to handle different commercial native data source/format/sample rates/signal resolutions.

If you are using GE12SL, Philips, or GLASGOW 12-lead ECG measurements/interpretation statements I can share these mappings with you.

jposada · March 5, 2020, 5:00pm

Hi Manlik,

We are in early phases with waveform data so anything you can share will be more than welcome. My email is jdposada at stanford.edu. Thanks!

benskov · March 16, 2020, 10:44am

Hi guys,

Thanks for your responses. I agree with your notion, @mkwong, that it’s better to capture all data and afterwards wind ways to operationalise it in a way useful for the analytic problem at hand.

It does seem from the response rate (and the nature of your responses) that essentially no sites have plugged “ICU data” into their OMOP CDM’s, but I’ll ask in the Device WG as well.

A lot of the data coming out of ICU machines has exact low-frequency equivalents: blood pressure, drug exposure (with continuous infusion), saturation. Mapping that should be quite straightforward, I’d argue. Other data types coming from e.g. ventilators might be more tricky because you also get settings data as well as observations.

Anyway, thanks a bunch! And if other people with ICU data stumble across this post and are interested in collaborating, please give me a shout.

Cheers,
Ben

Andrew · May 5, 2020, 3:29pm

Benjamin is right! Collaboration is exactly what we need. Let’s join efforts to finish the OMOP ETL of ICU data from MIMIC.
That will reach the broadest community doing similar work in this space and is furthest along toward completion. It will naturally add value to physionet community’s work by enabling linkage to broader data outside the ICE and begin to make analytics done on OMOPed data by the physionet members available as tools and methods the OHDSI community can use.

The physionet project for this ETL is at https://physionet.org/projects/1wW5mQkK2Cm10WuKav3z/overview/

Vojtech has revived an effort to finish the great work started by Alistair Johnson, Tom Pollard, Nicolas Paris, and Adrian Parrot. Manlik’s great work on mapping ECG waveforms to LOINC and SNOMED is an important example. The terrific Juan Banda is already on board.

Vojtech, do you want to organize this into an OHDSI WG in addition to a physionet project? Properly leveraging the network effects for this work could make a big difference in how quickly and how well it gets done.

gregk · May 7, 2020, 11:00pm

hi,

Just to clarify. I see that the ETL for MIMIC is here

Or is it not done, or we are talking about something else in addition to MIMIC here? Could someone please summarize it here?

mik · May 7, 2020, 11:09pm

Hi there,
at Odysseus we are currently investigating the existing github project as well and are trying our hands on the MIMIC demo dataset. We will be happy to share our findings.
Cheers
Mik

Andrew · May 8, 2020, 2:06am

Mik That’s great. The more hands the better.

Greg. Here is my summary: The ETL is not done. A very rough estimate of remaining work according to Alistair Johnson at is ~6 mo 100% FTE. Having more people pitch in will have a big impact. My understanding is that the “numerics” data generated by monitors and other devices in the ICU are the biggest challenge for completing it. Numerics refers not to the raw output of these ICU devices, but to a chunked representation that summarizes the raw output over short intervals - the multiple seconds to minutes range.

Alistair, Tom Pollard, Leo Celi and others in Roger Marks’ wonderful Laboratory for Computational Physiology have deep knowledge of the requirements for representing these “high resolution” data to meet so they be used for signal processing and data science that has an impact on ICU care delivery and outcomes. Meeting those requirements will be a critical component of completing this work correctly.

They are great folks and in PhsyioNet have created a vibrant and productive community much like OHDSI. Plus their example in freely providing high quality real datasets for both data science and education has really blazed a very important trail that too few are following. Partnering with them will be great. I think this product, OMOPed MIMIC data, will be an important catalyst to link the two communities and expand the algorithm development and outcomes research in ways neither could community could do separately.

Vojtech_Huser · May 15, 2020, 1:02pm

There is a project in draft mode led by me on Physionet. See this github issue here https://github.com/MIT-LCP/mimic-omop/issues/52

benskov · May 15, 2020, 7:26pm

@Vojtech_Huser, just put myself on the measurement table, but perhaps you could guide me a bit as to how I could help out. If not, I’ll figure it out myself

SELVA_MUTHU_KUMARAN · May 16, 2020, 12:52am

Yes, am interested and would like to participate as well. I have left my name in the google doc but would like to be guided on what has to be done. I understand we have to transform MIMIC III to OMOP form but I have few questions regarding timelines, what exactly has to be done and how are we as a team planning to do this and would like to get it answered before I can start.

Vojtech_Huser · May 20, 2020, 2:50pm

Use this thread for further discussion: Argos project: 2020 OMOPed MIMIC project

acoltri · July 22, 2020, 5:39pm

Hello Manlik. You mentioned having Loinc mappings to Ge and Phillips ECG interpretation phrases. I would be very interested in seeing them. As you did, I found that all the standard numeric measurements in the ECG are already covered, but was wondering about how to handle the impressions. Do you end up putting them in “Conditions” or “Observations” depending on the doamin of the CDM Standard Concept?

Thanks, Alan