OHDSI Home | Forums | Wiki | Github

CDM for waveform biosignal data

Hi, I am Jong-Hwan(John) at Ajou University in Korea. I am in the Ph.D. course for medical informatics. Especially I am studying the waveform biosignal such as ECG, blood pressure with machine learning methods.

Nowadays, unstructured medical data has taken an important role in the research area (genomic data, CT image, and biosignal, etc.)

Ajou university hospital has collected waveform biosignal data for 4 years and other hospitals started to collect raw biosignal also.

I think the demand for converting waveform biosignal data into CDM will arise and it is valuable to initiate a working group to discuss the CDM for waveform biosignal. Please leave a comment if you would like to join and have any opinion.

We described the first version of biosignal CDM format.
In URL, there is two files, one is description for data format. this contains the table format and file format for storing biosignal data. A table format is schema to save metadata in RDBMS system and file format is suggested file format to store raw waveform biosignal data. the other file is the list of concept id which can be used for biosignal but it is not perfect and should be gradually completed in the future. Thank you

URL : https://drive.google.com/drive/folders/1Nnc-AGVPEWqJr8jXWsY1qwRLIGwmFqyS?usp=sharing

1 Like

Hi - thanks for drafting this first version. A few suggestions to consider:

  1. Following other CDM table conventions, consider “biosignal_id” for the primary key
  2. Again following CDM conventions - add biosignal_source_value VARCHAR. I often use this field to capture the device model and serial number or unique device GUID so I can trace back to the source device used to generate the biosignal.
  3. If you provide a “directory” field, you might as well provide an optional “filename” field. This should be able to support both single completely contained biosignal procedure (ex 12-lead ECG XML document) or file sets that represent a “dataset” managed/organized under a single procedure folder. For example a patient monitor procedure that consists of different vitals - respiratory, ECG, O2, activity, etc - individual files within a common directory. Each biosignal (resp, ECG, etc) are individual records within the CDM referencing their respective “filename”.
  4. In addition to individual sample entries - we tend to operate on measurements derived from a block of signal. Or location of significant features - ex R-wave peak and its locations. Perhaps including an annotation table as well.

I’m interested in devices in general and UDI in particular. You say that you use a field to “capture the device model and serial number or unique device GUID”.

I have three questions.

When you have the device data, do you do any analysis of device performance, device issues, etc.?

When you enter the UDI, what do you include? The UDI is composed of the device identifier and often production identifiers. How much of it do enter?

How often are you able to obtain the UDI? Is it easily accessible or does it require some effort to determine it?

I work with electrocardiograph type devices used in the emergency department and pre-hospital settings (Emergency Medical Services and ambulances) - defibrillators. In these devices I do get indications of device failure in different contexts:

  • Too much artifact to execute the analysis algorithms
  • Lead reversals or missing leads
  • Failure to charge - defibrillators - for defibrillation
  • Other failures during power-on test

If the device is healthy enough to capture the failure and generate a record, I do get these (not all) when capturing the electronic patient care record downstream. Since the data also include the device serial number, make, and the manufacturer’s generated unique ID, it is possible to run quality analysis on both the device and the clinicians (ex lead placement errors or failed to perform an electrocardiogram).

The device ID may be the same thing as the UDI - I never got into that level of detail other than being able to track devices for our clinical trials.

Thanks for your opinions. We reflected your opinion and asked some questions as below

  1. Following other CDM table conventions, consider “biosignal_id” for the primary key
    -> Thank you. We added an explanation that it is a primary key.

  2. Again following CDM conventions - add biosignal_source_value VARCHAR. I often use this field to capture the device model and serial number or unique device GUID so I can trace back to the source device used to generate the biosignal.
    -> I agree with some researchers need the information of a device. But could you let me know why the information of a device is needed to analyze biosignal raw data which is converted in millivolt?
    If many researchers require that information, I think it is better to build a new table for device information. Because, as far as I know, ‘_source_value’ field should contain the value from the original database. If there is a reference or example for the convention of containing device information at ‘_source_value’, could you share the link or file for me? It is helpful to understand and improve biosignal CDM.

  3. If you provide a “directory” field, you might as well provide an optional “filename” field. This should be able to support both single completely contained biosignal procedure (ex 12-lead ECG XML document) or file sets that represent a “dataset” managed/organized under a single procedure folder. For example a patient monitor procedure that consists of different vitals - respiratory, ECG, O2, activity, etc - individual files within a common directory. Each biosignal (resp, ECG, etc) are individual records within the CDM referencing their respective “filename”.
    -> I fully understand why ‘filename’ field is required. However, what I intended that ‘directory’ column includes filename at the same time. If we create a column for various ‘filename’, it is helpful to store various types of files but too complicated to use. That is why I built a few fixed file format. However, it is not deterministic, we can find the middle ground. However, I would like to share the philosophy of reducing complexity and increasing research convenience.

  4. In addition to individual sample entries - we tend to operate on measurements derived from a block of signal. Or location of significant features - ex R-wave peak and its locations. Perhaps including an annotation table as well.
    -> I’m with you. I will design an annotation table to reflect your opinion. Roughly, Annotation table could contain the columns such as biosignal_id, event_time and annotaion_id(kinds of annotations). It would be great if you could give me some advice.

Thanks again and hope to build a useful CDM model.

Thanks for moving this topic along.

Re #2 - the base convention is to use this field to indicate where the data came from to provide a trace-back mechanism. I found it useful in my implementation to include within the _value_source more details for finding and fixing ETL problems. An example is capturing invoiced items for a medication - the drug_source_value would include the EHR table <invoiced_items> and transaction ID: drug_source_value = invoice_item.1234567abcde. If there were any questions about the ETL mapping, I can go directly back to the source database table and look up the row using the transaction ID.

Re #4 - for ECG data, the most common annotations I use is based on George Moody’s annotations for beat classification. This has been around since the 80s. Otherwise diagnostic annotations largely use SNOMED where possible - ex Left Bundle Branch Block, acute onset. LOINC for heart rate, BP, etc. Which means the annotation table should support the primary concept_id and one or more qualifiers/modifiers.

Thanks for your feedback.
I fully understand what you say and I will revise biosignal CDM according to your advice.
One thing I still wonder is why device ID is required. I guess, it is for a case when researchers want to know more detailed information like resolution or Null representation?
Anytime you have any idea, share you idea through this page. Thank you

Yes - we had to go back to the source to re-run the ETL due to user changing the data resolution and even the sampling rate. We run multi-year clinical trials and sometimes devices are replaced and not setup the same way when data is exported out. So having a way back to the source saves a lot of time and effort.

One other thing that helps me think through these sorts of effort is to gather some real use cases that might benefit from the data being in the CDM. It is a very efficient way of sorting out what is really necessary and at the same time address short term needs.

Do you have the resultant table specs after the suggestions made throughout the thread? We likewise are working on implementing biosignals for a targeted project. It would be fantastic if we could implement a harmonized data structure, even if it is in its infancy. Many thanks for any guidance.

We use the XML schema at http://medicalequipoise.org/xml-dtd/ccsiecg.dtd to convert native devices to a common staging format. The XML schema was originally designed for 12-lead electrocardiograph data, but can support any continuous signals (I think). Since our research warehouse is a standard OMOP CDM, the meta data and clinical findings are mapped and loaded by our ETL into OMOP and the source_value provides a reference back to the XML document if you need to get the waveform data to render or conduct further analysis. So its is a work-around to what jong-hwan has proposed - creating a new OMOP table.

If we do create new bio-signal tables in the future, a new ETL can then import the XML data to it. In the mean time the bio-signal can still participate in a limited way in standard OHDSI tools and projects.

Can you by chance send the schema directly? The link does not work. http://medicalequipoise.org/ROOT/xml-dtd/ccsiecg.dtd

Sorry - remove the /ROOT from the URL: http://medicalequipoise.com/xml-dtd/ccsiecg.dtd

It should download the schema to your download folder.

t