OHDSI Home | Forums | Wiki | Github

Is there an OMOP CDM sandbox?

I was just wondering if there is an OMOP CDM sandbox environment where people could test out queries against the CDM to get familiar with it. I sort of created my own using ETL-Synthea with SQL Server and downloading the CSVs from Athena. But I was wondering if there was a quicker way where you don’t need to install a DB server or generate synthea patients or download vocabularies.

I’ve also thought about creating one myself since I think it would have been useful, but I’m not sure if there are any restrictions regarding, for example, the vocabularies. If you only use the ones that don’t require a license, is there any reason why there couldn’t be a public sandbox where anyone can initiate queries? Are there any sort of terms of use or something that this would violate?

1 Like

There’s a public Atlas instance, but not a public database sandbox. I think the security risks might preclude us from having that.

But if you can use Docker, then Broadsea gives you everything to get up and running locally in 15 minutes. We are preparing a new release of Broadsea here:

Oh, neat. I’ve not heard of Broadsea. I will have to check that out.

Yes, I’m familiar with Atlas, but I was talking about a database sandbox where you could, for example, test out queries from https://data.ohdsi.org/QueryLibrary/ or just try out your own.

Yeah, I was also wondering about security. My idea was to use a read-only sqlite database with SQL.js hosted on gitlab large file storage. But it would be a waste of time if it violated some terms of service and had to be taken down.

Eunomia might give you what you’re looking for @mccullen_j. It’s an R package that sets up a small OMOP CDM (including a subset of the vocabularies) in SQLite.

Oh, I just came across that. I just have not got it to install correctly yet. Thanks.

It says it can not find “getEunomicaConnectionDetails” even after I install it.

maybe a typo? should be getEunomiaConnectionDetails() :slight_smile:

I actually found out that the package did not install correctly because it “is not available for this version of R”. I spelled it right in my code, but the installation was not successful. I think I should be able to figure it out though. Maybe try using a different version of R.

Ahh I may know the issue - the Eunomia docs aren’t updated to reflect this but the package is currently not on CRAN and thus needs to be installed via GitHub: Installing Eunomia problem - #4 by schuemie

Oh, got it. Thanks!

How were you able to get ETL-Synthea to work?

Just make sure you use synthea 3.0 or 2.7 (I used 3.0) to generate patients and set the option to export to csv so you get csv files. After that, I just used the code in the readme of the ETL-Synthea repository, updating the connection details for my sql server instance.

Are there specific issues you are running into?

Here is my output:

cdmSchema ← “synthea”
cdmVersion ← “5.4”
syntheaVersion ← “3.0.0”
syntheaSchema ← “native”
syntheaFileLoc ← “/home/acumenus/GitHub/synthea/output/csv”
vocabFileLoc ← “/home/acumenus/GitHub/synthea/vocabulary_v5_latest”

ETLSyntheaBuilder::CreateCDMTables(connectionDetails = cd, cdmSchema = cdmSchema, cdmVersion = cdmVersion)
Connecting using PostgreSQL driver
|============================================================================================================================================| 100%
Executing SQL took 0.13 secs

ETLSyntheaBuilder::CreateSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaVersion = syntheaVersion)
Running synthea_version/v300/create_synthea_tables.sql
Connecting using PostgreSQL driver
|============================================================================================================================================| 100%
Executing SQL took 0.0439 secs

ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
Connecting using PostgreSQL driver
Loading: allergies.csv
| | 0%Error in rJava::.jcall(batchedInsert, “Z”, “executeBatch”) :
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO native.allergies (“1993-03-08”,V2,“4b9c1991-8733-d3f6-777d-6310b5dd7af2”,“c689f326-df74-aaa5-2aa0-a4bdb17a963a”,“419199007”,Unknown,“Allergy to substance (finding)”,allergy,environment,V10,V11,V12,V13,V14,V15) VALUES(8467,NULL,‘4b9c1991-8733-d3f6-777d-6310b5dd7af2’,‘c689f326-df74-aaa5-2aa0-a4bdb17a963a’,29046,‘Unknown’,‘Lisinopril’,‘intolerance’,‘medication’,NULL,NULL,NULL,NULL,NULL,NULL) was aborted: ERROR: column “1993-03-08” of relation “allergies” does not exist
Position: 31 Call getNextException to see other errors in the batch.

So I am not sure where the problem is.

Maybe the wrong synthea version? Are you using v3.0? Release v3.0.0 · synthetichealth/synthea · GitHub

I used the synthea-with-dependencies.jar
and did

java -jar synthea-with-dependencies_30.jar -p 1000 -c ./config.properties

My config.properties looks like this:

exporter.csv.export=true
exporter.fhir.export=false
exporter.hospital.fhir.export=false
1 Like

Thank you for this clarification! I didn’t realize, and I went ahead and created with the latest version of synthea. This is likely why the ETL-Synthea has been failing for me.

I will start over and report back with my progress.

:+1:

You’re welcome. I did the same thing the first time. I think the synthea schema has changed so you need to use synthea 2.7 or 3.0

I’m currently loading my synthea dataset thanks to @mccullen_j

One thing I noticed is that synthea 3.0 generates much less rich data than the current version. I am hoping the the developers/maintainers of ETL-Synthea could update the project/repo to catch up with the current version of synthea?

Also for larger synthea populations… there is a need to either chunk the large.csv files or increase your machine’s swap size to something ridiculous like 64gb+.

Here is my current load script:
ETL-Synthea-dbLoad-R-Script-w-datatable.txt (2.1 KB)


It is taking 10+ hours to load 233K patients. YIKES!

Oh, nice. Glad to hear you got it working.

Yeah, I only did 1000 and my bottleneck was actually the concept tables. I would say updating ETL-Synthea merits its own topic. It sounds like a great idea though. You could fork it, make some updates yourself, and then initiate a pull request too.

1 Like
t