I was just wondering if there is an OMOP CDM sandbox environment where people could test out queries against the CDM to get familiar with it. I sort of created my own using ETL-Synthea with SQL Server and downloading the CSVs from Athena. But I was wondering if there was a quicker way where you don’t need to install a DB server or generate synthea patients or download vocabularies.
I’ve also thought about creating one myself since I think it would have been useful, but I’m not sure if there are any restrictions regarding, for example, the vocabularies. If you only use the ones that don’t require a license, is there any reason why there couldn’t be a public sandbox where anyone can initiate queries? Are there any sort of terms of use or something that this would violate?
There’s a public Atlas instance, but not a public database sandbox. I think the security risks might preclude us from having that.
But if you can use Docker, then Broadsea gives you everything to get up and running locally in 15 minutes. We are preparing a new release of Broadsea here:
Oh, neat. I’ve not heard of Broadsea. I will have to check that out.
Yes, I’m familiar with Atlas, but I was talking about a database sandbox where you could, for example, test out queries from https://data.ohdsi.org/QueryLibrary/ or just try out your own.
Yeah, I was also wondering about security. My idea was to use a read-only sqlite database with SQL.js hosted on gitlab large file storage. But it would be a waste of time if it violated some terms of service and had to be taken down.
Eunomia might give you what you’re looking for @mccullen_j. It’s an R package that sets up a small OMOP CDM (including a subset of the vocabularies) in SQLite.
I actually found out that the package did not install correctly because it “is not available for this version of R”. I spelled it right in my code, but the installation was not successful. I think I should be able to figure it out though. Maybe try using a different version of R.
Ahh I may know the issue - the Eunomia docs aren’t updated to reflect this but the package is currently not on CRAN and thus needs to be installed via GitHub: Installing Eunomia problem - #4 by schuemie
Just make sure you use synthea 3.0 or 2.7 (I used 3.0) to generate patients and set the option to export to csv so you get csv files. After that, I just used the code in the readme of the ETL-Synthea repository, updating the connection details for my sql server instance.
ETLSyntheaBuilder::LoadSyntheaTables(connectionDetails = cd, syntheaSchema = syntheaSchema, syntheaFileLoc = syntheaFileLoc)
Connecting using PostgreSQL driver
Loading: allergies.csv
| | 0%Error in rJava::.jcall(batchedInsert, “Z”, “executeBatch”) :
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO native.allergies (“1993-03-08”,V2,“4b9c1991-8733-d3f6-777d-6310b5dd7af2”,“c689f326-df74-aaa5-2aa0-a4bdb17a963a”,“419199007”,Unknown,“Allergy to substance (finding)”,allergy,environment,V10,V11,V12,V13,V14,V15) VALUES(8467,NULL,‘4b9c1991-8733-d3f6-777d-6310b5dd7af2’,‘c689f326-df74-aaa5-2aa0-a4bdb17a963a’,29046,‘Unknown’,‘Lisinopril’,‘intolerance’,‘medication’,NULL,NULL,NULL,NULL,NULL,NULL) was aborted: ERROR: column “1993-03-08” of relation “allergies” does not exist
Position: 31 Call getNextException to see other errors in the batch.
Thank you for this clarification! I didn’t realize, and I went ahead and created with the latest version of synthea. This is likely why the ETL-Synthea has been failing for me.
I will start over and report back with my progress.
I’m currently loading my synthea dataset thanks to @mccullen_j
One thing I noticed is that synthea 3.0 generates much less rich data than the current version. I am hoping the the developers/maintainers of ETL-Synthea could update the project/repo to catch up with the current version of synthea?
Also for larger synthea populations… there is a need to either chunk the large.csv files or increase your machine’s swap size to something ridiculous like 64gb+.
Yeah, I only did 1000 and my bottleneck was actually the concept tables. I would say updating ETL-Synthea merits its own topic. It sounds like a great idea though. You could fork it, make some updates yourself, and then initiate a pull request too.