Adding raw observation data in OMOP

Jens · March 9, 2020, 2:47pm

Hi Everybody,

I am part of a research group for digital health at a German university, that intends to transform large amounts of physical activity data from separate excel sheets into a single Database. We’ve decided to give our database an OMOP structure but ran into a problem.

Some of the data we would like to store is in the form of physical workout sessions. Event though it is possible to store overarching variables of such sessions (e.g date, date time, etc) we didn’t find a way to store the raw data that a session consists of. The issue we are facing here is that a session can easily take 30 minutes, in which data was collected at frequencies up to 16 data points per second, which means that a session can contain more than 25.000 data points.

If we understood everything correctly, a workout session would fit quite well into an OMOP “Observation” Table. Problematic for us was, that we couldn’t find an abstraction layer below an observation that supports raw data.

To solve this problem we came up with two possible solutions that we would like to suggest. Our first idea is to add a “source_data” column to the “Observation” table. If it is possible to give it a large enough number of allowed characters, we could store our raw data in this column using the JSON format. This would be a very efficient solution but somehow hack the idea of a relational database.

Our other idea was to add an additional layer below an “Observation”. That layer would most likely be in the form of a “SourceData” table that would have to offer a variety of columns to store values in. The risk we see for this approach is that it is quite inefficient once the database grows large enough. It is also hard to provide the right number and type of columns for every type of source data.

Does the OHDSI-Team, or you the OHDSI-Community, have a clever idea how we might be able to solve this issue consistently?

For the moment we will most likely use some kind of hack on our side, but we are really interested in a long term solution from the OHDSI side to make sure that we keep up a consistent version of OMOP …

DTorok · March 9, 2020, 3:11pm

OHDSI allows adding columns or tables to a local schema for site specific data capture or analyses. The only prohibition, do not override an existing OHDSI attribute with your own specialized meaning or data, which you are not doing. Some ETL’s of EHR data may have a similar situation if they capture patient monitoring data. Maybe someone will have more to say. Sounds like you prefer the JSON representation, so I think whether you add a row to the Observation table or add a new table below Observation, but still store data in JSON format, will be determined by how your database system handles ‘blob’ data.