2020 OMOPed MIMIC project

Vojtech_Huser · May 20, 2020, 2:48pm

This thread is to be used to document work on converting MIMIC demo (and full) data to OMOP.

Prior ETL (https://github.com/MIT-LCP/mimic-omop) has last commits from 2 years ago and will be refreshed. Physionet v1.0 release will be based on that code. If we improve the mapping, Physionet release 2 will contain those additions. For now, we will still use that GitHub repo as the one and only repo for the ETL.

Links:

OMOPed MIMIC (Argos) Google drive folder link is: https://drive.google.com/open?id=1j-x-rwuYJr2nIs5zxCW6ST_Q-vPc1tfN

Github issue for the conversion: https://github.com/MIT-LCP/mimic-omop/issues/52

This forum thread will be primary way to communicate updates.

Argos is the dog of Odysseus. It is no acronym. We just need a name for the project.

Vojtech_Huser · May 20, 2020, 2:58pm

May20 update: 2 additional researchers from Stanford joined the team. (see central notes on Google Drive folder). Nicolas Paris plans to run extract for demo data and help that way with relase 1.0 on Physionet. Thank you for folks who volunteers to be primary person for an OMOP table. MIT Physionet team provided guidance. If you want to volunteer as technical lead (let me know; I will assume that role until we have a volunteer for that role).

We will focus on MIMIC demo data first. We will author release notes on Google Drive (central notes).

The team has 10+ members and growing. N3C even wants to help. (update pending)

jposada · June 2, 2020, 4:29pm

Update:

We started the work on the procedure_occurrence table. The code is here

github.com

jdposada/mimic-omop/blob/master/etl/StandardizedClinicalDataTables/PROCEDURE_OCCURRENCE/etl_bq.sql

WITH
proc_icd as (SELECT mimic_id as procedure_occurrence_id, subject_id, hadm_id, icd9_code as procedure_source_value, CASE WHEN length(cast(ICD9_CODE as STRING)) = 2 THEN cast(ICD9_CODE as STRING) ELSE concat(substr(cast(ICD9_CODE as STRING), 1, 2), '.', substr(cast(ICD9_CODE as STRING), 3)) END AS concept_code FROM {work_project_id}.{work_dataset_id}.procedures_icd),
local_proc_icd AS (SELECT concept_id as procedure_source_concept_id, concept_code as procedure_source_value FROM {cdm_project_id}.{cdm_dataset_id}.concept WHERE domain_id = 'd_icd_procedures' AND vocabulary_id = 'MIMIC Local Codes'),
concept_proc_icd9 as ( SELECT concept_id as procedure_concept_id, concept_code FROM {cdm_project_id}.{cdm_dataset_id}.concept WHERE vocabulary_id = 'ICD9Proc'),
patients AS (SELECT subject_id, mimic_id as person_id FROM {work_project_id}.{work_dataset_id}.patients),
caregivers AS (SELECT mimic_id AS provider_id, cgid FROM {work_project_id}.{work_dataset_id}.caregivers),
admissions AS (SELECT hadm_id, admittime, dischtime as procedure_datetime, mimic_id as visit_occurrence_id FROM {work_project_id}.{work_dataset_id}.admissions),
  proc_event AS (
  SELECT
    t2.mimic_id AS procedure_source_concept_id,
    t1.mimic_id AS procedure_occurrence_id,
    subject_id,
    cgid,
    hadm_id,
    itemid,
    starttime AS procedure_datetime,
    label AS procedure_source_value,
    value AS quantity -- then it stores the duration... this is a warkaround and may be inproved
  FROM
    {work_project_id}.{work_dataset_id}.procedureevents_mv t1

This file has been truncated. show original

This is a first attempt and it is open for comments or suggestions. Let’s keep the ball rolling

Vojtech_Huser · June 3, 2020, 1:23pm

June 3 update:

BQ updates
some steps done towards a funding proposal (Andrew W. can provide best update)
At the folder link, there is spreadsheet for tasks that we try to prioritize. Please add you vote to what tasks you see important or add new tasks. (file central spreasheet) at the google folder here: mimic-omop - Google Drive
I made some work on loading demo OMOPed data into SQLite file and running some OHDSI tools on the resulting file

Tasks look like this:

Mapping coverage from prior presentation:

Juan_Banda · June 3, 2020, 1:35pm

Hello,

I still see this effort a bit disparate and all over the place. Should we all do the same tasks and compare notes? Should we focus on the tables we signed up to do? Are there any deadlines for deliverables? Do we need to use BQ or not (I see that Jose did, but not you)? While I am very willing to contribute and have the results of the full dataset mapped to OMOP, I don’t see structure as to when and how things should be done and what deliverables are expected?

I think we should have a call (and I strongly dislike calls ) to get all on the same page and structure this effort a bit more, unless I completely missed the point or some parts when this was already discussed.

Thanks!

Andrew · June 3, 2020, 1:55pm

I’m sending an email out to all those involved shortly to explain the funding possibility. If successful, it would help address some of Juan’s points.

mik · June 3, 2020, 4:06pm

Hi,
The Odysseus team has done some analysis using the existing project on the demo dataset. I added the results to the Central Notes document.
Cheers
Mik

Vojtech_Huser · June 3, 2020, 6:17pm

Should we all do the same tasks and compare notes?

The task overview presented above is one way to collectively decide the most important tasks. Please vote on those and add tasks you see missing.

Should we focus on the tables we signed up to do?

The tables division was one way to divide the work. (besides tasks). My goal with the table “stewardship/babysitting” was to let folks think about quality of v1 mapping (2 years old) and suggest gaps. The posting by Michael Kallfelz presented a good overview for all tables. (but brief)

Are there any deadlines for deliverables?

No - because some folks are waiting for funds to cover their effort. Others are contributing some time (and not waiting). One deadline is to release v1 of OMOPed demo data by July 30th or sooner. (in a shape that MIT will approve; with release notes and their mandatory fields (see central notes for draft)). Nicolas promised to create the CSV files for that. I may have to look for other source since he may be busy.

Do we need to use BQ or not (I see that Jose did, but not you)?

Only for v2 - we may use BQ. I think a good plan is to make mapping platform independent so implementation in many flavors is possible. BQ has some advantages over current postgres. What do you think, Juan?

Here is the script to load in SQLite and later run ohdsi tools.
https://rpubs.com/vojtech_huser/OMOPedMIMIC1

Thank you all who responded on this forum publicly! This helps the momentum. We need more posts with questions like you posted. Ad meeting, I am fine with meeting. @parisni, we probably need time to accommodate EU time zone. To prepare for the meeting, can everyone vote and add tasks they see important. (spreadsheet file on google drive, same link as always)

Juan_Banda · June 3, 2020, 8:55pm

Thank you! This clears out a lot of things in one single place. I do agree that should be platform independent, but it will be harder to normalize from SQLlight/Postgres (what I am using)/BQ, and others if we let this be a loose requirement at first. However, I see how this would be the quickest way to get something out by end of July.

Will go and vote for the tasks now.

Vojtech_Huser · August 12, 2020, 2:47pm

The funding for the project was approved (Thanks to Andrew Williams). Kick off happened last week and this week was first regular meeting. Mimic4 will be the input source we will be converting (not mimic3 ! yay). Mimic4 will be released probably this week by Physionet team.

Vojtech_Huser · August 17, 2020, 5:16pm

MIMIC-IV was released on Aug 14, 2020. See https://mimic-iv.mit.edu/docs/datasets/ (demo version will also be released at some point (soon) (per this weeks OHDSI MIMIC meeting)

Juan_Banda · August 17, 2020, 5:31pm

There is a OHDSI MIMIC WG? Can I be added on the meeting mailing list? I don’t see it on the main page (projects:overview [Observational Health Data Sciences and Informatics]) pardon my ignorance on this

Vojtech_Huser · September 8, 2020, 5:16pm

This is a public reply to a U of Florida researcher: Latest info can be obtained here.
link https://github.com/OHDSI/MIMIC/wiki/Meeting-minutes#monday-august-31-2020---weekly-requirements-meeting

Vojtech_Huser · January 4, 2021, 2:01pm

in December 2020, the ETL was released on Github at https://github.com/OHDSI/MIMIC

We encourage all OHDSI researcher to test it. Please file any issues/questions in that github repo.

Demo MIMIC-IV data exist. CSV files (one per OMOP table) or SQLite file (single file with all tables) exist. If interested, like this message and email me which one you prefer to be posted on Physionet. (it needs MIT approval - we are working on that…)

jon_duke · January 4, 2021, 2:27pm

@Vojtech_Huser Thank you for this! CSV would be our preference.

Vojtech_Huser · June 22, 2021, 12:52pm

The MIMIC WG worked heard and reached a milestone. Thank you to all who contributed. I am pleased to announce that demo data in OMOP format (for MIMIC IV; demo = 100 patients) have been published yesterday.

Link: MIMIC-IV demo data in the OMOP Common Data Model v0.9

We seek community feedback on v0.9 of the ETL and data.

Full (not demo) data can be converted using the same ETL. (require DUA and training certificate). We are yet working on similar release for full data yet.

The download looks like this:

AntoineLamer · June 29, 2021, 2:47pm

Congratulations for the work done !!

As part of a master’s degree in data science for health (sorry, it is in french), I plan to use the mimic-omop database (version III implemented by Nicolas and Adrien, or your version IV if possible) for database courses, and a datathon on data visualisation.

For the first one, I will ask intensive care specialists to suggest which queries they would like to use (for activity, research, quality of care) and would distribute the queries among all the students.
For the second one, I’ll ask to intensive care specialists to suggest what they would like to visualise, and will ask to student to develop related dashboards and visualizations.

Would you be interested in participating or would this be useful to you?

mik · July 1, 2021, 1:53pm

Hi @AntoineLamer ,
that sounds like a great match. We are currently throwing more UAT at the full MIMIC conversion by reproducing questions from existing publications. I think it makes a lot of sense to extend it to real life questions from your people and refining it even more based on those.
Please get in touch!
~mik

AntoineLamer · July 9, 2021, 6:12am

Hi !

Of course ! Here is a tentative start of the list of queries, feel free to complete MIMIC-OMOP - HedgeDoc

Is the final OMOP ddl for OMOPed here ? Why is each table name prefix with cdm ?
It seems implemented for Oracle or Sql Server ? Would you be interested for an implementation in Postgresql ?

jposada · January 27, 2022, 3:54pm

Hi @Vojtech_Huser and @Andrew ,

Has anyone run the ETL on an environment outside the developers, Does anyone have BQ dataset with the fully converted MIMIC IV to OMOP outside the developers? How much did it cost on BQ to run everything end to end?

I am very interested in using this resource as part of one of my classes this semester

Thank you for all the help here

Jose