OHDSI Home | Forums | Wiki | Github

Has anyone use OSIM data sets?

I was trying to download the data set by Filezilla, bt I keep getting connection timed out errors. I used the credentials provided by the OSIM 2

http://omop.org/OSIM2

Server: 54.205.167.229
Login: anonymous
Password: blank
Our FTP server supports SFTP protocol (port 22)

has any one tried downloading this data set, how useful this data set is for identification of lab results before and after Index diagnosis? Are there any prescription data available, to find treatment switch??

Any help !!

Thanks
Jinx

@Ajinkya_Patale:

Try here: ftp://ftp.ohdsi.org/osim2/.

Let us know if you need other OSIM datasets.

@Christian_Reich … Thanks much !! so when I connected to the server I could see 4 folders,

  1. for_erica
  2. osim2
  3. synpuf
  4. tutorials

I assume the osim 2 folder has the patients which I could see 1m and 50m, which could be downloaded for the purpose of analysis. I could also see some files (OMOP standard) in synpuf folder in 1.0.0 and 1.0.1, what are they? and there are some thumbnails for the folders like care_sites, condition_occurrence, etc. and they are all csv. Could you please guide me from where I could get the sample data set of patients?

Thanks again
Jinx

@Christian_Reich… I was also thinking if we can get an complete data set where we could do the following analysis

  1. Build Patient Cohort
  2. Understand characteristics of the cohort like
    a. Incidence rate of the conditions
    b. Treatment discontinuation pattern
    c. Lab results before and after the Index diagnosis to understand the prevalence of the lab values
    d. Comorbidities of the cohort along with the Index diagnosis
    e. Understand the Health outcome of Interest
  3. Track prescription pathways, like treatment switch and may be cost associated with the prescription
  4. Safety and signal detection
  5. May be some details about rare disease and their treatment
  6. some commercial data in terms of prescription costs, fill , refill , claims details, provider details in terms of understanding prescription pattern of the provider.

This sure sounds a lot, but what I am trying to build is a story for which I am trying to get a data set.

Please let me know, what are the possibilities of getting a data set of may be 50m patients with above information included in the data set.

Can OSIM simulate this kind of data? where can I get the OSIM for downloading, as I want to explore the data simulation myself.

Thanks for your help… !!
Jinx

Yes, that’s what you are looking for.

They are another synthetic database in OMOP you can use. Use 1.0.1. You can use both as test patient data.

That’s exactly what they are used for.

They don’t have lab, cost or provider. But you could tweak the OSIM2 builder and use it to simulate those aspects of the data as well. And publish it back into the OHDSI Github site. :smile:

@Christian_Reich Thanks for the info… I tried downloading the OSIM2 Simulator, but the page was giving me an error… “Page not found”… is there any other source from where I can get the OSIM2 ?

Thanks
Jinx

I’ve uploaded it.

I don’t know if either of the suggestions below are viable alternatives to updating OSM2 but I am pursuing both:

  • At last week’s AMIA, I met Jason Walonoski from Mitre, one of the core developers of the Synthea patient-level simulator (https://github.com/synthetichealth/synthea) This is a Markov discrete event simulation engine that uses templates for demographics and disease state machines to simulate birth-to-death patient histories. At this time, about a dozen diseases have been modeled using demographics for the State of Massachusetts. The current engine outputs CCDA, CSV, FHIR and HTML. I hope to find a summer student to create OMOP CDM V5.1 compliant CSV files. I’m trying to convince folks in Epidemiology to have students create new disease models as a class assignment to expand the span of simulations.

  • MIT published the following machine learning-based approach to creating a simulated version of a real database. The simulated database retains the same distributions as the original. See http://news.mit.edu/2017/artificial-data-give-same-results-as-real-data-0303 for a summary and http://dai.lids.mit.edu/SDV.pdf for methodological details. I have a biostats post-doc looking at implementing this approach against our EHR database.

Hey Christian,

Thanks for the link. I need an OSIM2 data with signals but the data you have shared on the link( 1Million Data) only have signals ( risk,benefit) for 60 different drug-condition pair. Why it is 60?

Do you have any other OSIM2 data which has a lot of signals(outcomes) more than 100 or 1000. I will use the data in a research.
Regards,

Oytun

@Oytun_Gunes:

No. The OSIM2 data are not actively supported. In other words, nobody is running the simulator and is creating data. But if you want to do that feel free to tweak the constructor yourself and share the resulting simulations.

Hello Christian,

I am planning to use the 1Million SNOMED Data you have shared, where can I find the mapping of condition_id and drug_ids and so on?

Regards,

Oytun

@Oytun_Gunes:

Mapping to what? We usually call “mapping” the process of translating a coding scheme used in a source database to Standard Concepts during the ETL to the OMOP CDM. The simulated data have no source, they are created as is from scratch. All Concepts used in the condition_concept_id and drug_concept_id are Standard Concepts. You can find them, like any other reference information, in the CONCEPT table. Makes sense?

@Christian_Reich Hi. It seems that the FTP servers are down at the moment. Will they be back up at any time? Alternatively, is there anywhere else to find these simulated data sets?

@francois_meyer:

Working on it.

@Christian_Reich:

Hello Christian,

I downloaded the OSIM 2 data from the FTP server you kindly provided. My aim is to use the dataset in a multi-armed bandit setting, for which I could really use the true relationships between the conditions and drugs. As far as I can tell, these relationships were provided in the OMOP Cup to the competitors along with the data.

Is there any way for me to access this information? It would be great if I could somehow download the same package you provided for the competitors in the OMOP Cup. Thanks in advance!

@dogatekin:

Yeah. It exists. If I only knew where. There is a probability matrix for each condition and each drug, based on the true relationships found in the data. I can ask around, Which OSIM file are you using?

But if you really want to go that deep into this it may be better if you build one from scratch, the code is available. As explained in this Forum debate, nobody is really working on OSIM2 anymore, but there are other initiatives.

Thanks for your very quick response.

Right now, I am playing around with OSIM2_1M_MSLR_SNOMED_0, but I also downloaded the 50M files with and without the risk signals infused. I did find a CSV file called “signal_ref” that contains the relative risks connecting drugs and outcomes, is that the file you were talking about by any chance? Or is there a complete probability matrix elsewhere?

I was hoping I would not have to build one from scratch, but I will of course look into it if I have to. If I do end up building one, would I need a complete observational database to base the simulated persons on?

@dogatekin:

Then the files should be OSIM2_1M_MSLR_SNOMED_0 and OSIM2_1M_MSLR_SNOMED_1.

The file signal_ref contains the the signals infused after simulation. Those signals are drugs to conditions (side effects), not the other way around.

The probabilities between drugs and drugs and drugs and conditions are in the Transition Matrix. The file name should be “OSIM2 MSLR SNOMED Transition Matrix”, where “MSLR” is the name of the database.

@Christian_Reich:

I actually only have access to the one without the risks infused (0), since that is the only one on the ftp server: ftp://ftp.ohdsi.org/osim2/ Are there more data I can download somewhere else?

That is actually what I am looking for, sorry if I confused you with my previous message :smile: It was my understanding that the true relationships provided in the OMOP Cup were about the relationships from drugs to conditions.

Which brings us to my final question, where are these transition matrices? I finally decided to build a dataset from scratch using the OSIM 2 code, but I either need a real observational database or (more conveniently) the transition matrices. I unfortunately cannot find any in the ftp server, am I missing something?

Sorry to keep nagging you about an “unsupported” data simulation tool, but I could really use a little help!

@dogatekin:

Yeah, you are a little out of luck. Maybe @richm or @mkhayter still have old versions. The data files used to be stored on AWS and went where all living things eventually go - when that server was switched off due to lack of funding. What’s on the FTP site are a few remnants, and the code to make yourself a new one. If you want to. The question is, why would you want that.

The way it worked was this: The simulator would simulate conditions, as a function of existing conditions (or lack thereof). And it would simulate drugs based on conditions. That is a world, where no drug causes any conditions, which means, they have no side effect. These are the _0 files. The _1 files resulted from adding those side effects based on a separate risk factor provided. The OMOP cup would then have various competing methods try to figure out what that risk was. Nice idea, except it forgets that there are all these relationships between everything called confounding. So the best method in the cup wasn’t necessarily the best method in finding the risks in the real world.

Makes sense? What is your use case? What are you trying to do?

If you really want to recreate OSIM2, you will have somebody run the script building the transition matrix. So, you have to go to a data owner or subscriber. Happy to help you out if you can tell me more what it is you want to do.

t