Machine Learning using OMOP Simulated Data (OSIM2)

R_Madariaga · November 13, 2017, 3:44pm

I am trying to use OMOP Simulated Data (OSIM2, folder OSIM2_1M_MSLR_SNOMED_0_CSV) downloaded from the ftp.ohdsi.org server in order to perform some Machine Learning experiments to develop new ML algorithms and methodologies.

I can use the drug_era csv file to relate drugs (column drug_concept_id) to persons (column person_id).

I may also use the condition_era csv file to relate conditions to persons in a similar way.

However, in my new ML methodology, a supervised learning algorithm, I need to state several TRUE relationships between drugs or conditions and persons, in order to perform training. To do so, I don´t really understand the role of the signal_ref file. I would like to know if the relationships obtained grouping together in that file the drugs (drug_concept_id) or conditions (condition_concept_id) that have the same signal_id can be considered as TRUE relationships as the relationships obtained with the drug_era and condition_era files would be considered as ‘common’ (not all necessarilly TRUE) relationships.

In other words I would like to know if the signal_ref file can be considered as a source of TRUE relationships as if they were SIGNALs vs. NOISE in the underlying file system of the database. And if there is any column in the signal_ref file to distinguish different roles among all the severals decenes of signals, mainly to distinguish if there are any ‘non-TRUE’ signals.

Thank you very much