I’ve tweaked the Oracle DDL to create a MySql instance of CDM and I’ve populated it with data for about 1.1 million patients.
With the indexes created by the CDM DDL (5.4) I’m seeing an odd difference between CDM running on MS Sql Server and on my MySql instance. There could be myriad differences between the two systems but I don’t have access to the MS Sql Server instance other than being able to query it through sql (i.e. I have no idea how the server is configured, what hardware it is running on, etc.).
Anyone out there with DBA/IT skills (or access to someone with DBA/IT skills) that can comment on what might be the difference between the two systems?
It does not look like the other system has any additional indexes that would account for these differences.
For example this query is very slow (about a minute) on my MySql instance and very fast (less than 0.001 seconds) on the MS Sql Server instance. The visit_date is not indexed on either system.
I’ve been running a MySQL version of the 5.3 CDM since 2019 - currently on MySQL 8.0.27 with updated JDBC drivers for our veterinary school data - 380K patients and 1M visits. MySQL and MS SQL like any other database you choose - query performance depends on a lot of factors. How much memory you allocate to the database server, CPU/OS, etc. If you have not setup indices for the various tables - since you setup a non-MS SQL instance - that is the first place to start. I’ve setup indices based on the types of queries I’m running in my projects and get pretty good performance.
Hey @greshje.gmail , just a quick note to remind you and others that mySQL is not one of the supported database platforms for the OHDSI tools. Our main OHDSI tool to enable analytic connectivity with the database is an R package called DatabaseConnector (available via CRAN, but here’s the OHDSI Git repo link: Connecting to Various Database Platforms • DatabaseConnector) This lists the supported databases at this time, currently 10 others (Postgresql, Oracle, SqlServer, Sqlite, PDW, Redshift, Impala, Netezza, BigQuery, Spark). The HADES tools that do the analytic work also are limited to support to these databases: HADES. The primary driver of the decision to support a database platform is the RDMBS’s ability to support a minimum set of analytical capabilities within its native SQL dialect. The set of functions that are required are listed in SqlRender package: Using SqlRender • SqlRender. Of course, the OMOP CDM itself is just a data model, so it can be stored in whatever format you want, but I wanted to just give you a heads up in case your goal is to use your CDM instance to generate evidence using any of the OHDSI tools.
Thanks @Patrick_Ryan . Unfortunately the framework I’m working with is (currently) only MySql. Extending it to MS Sql Server is not out of the question. Would it be helpful to the OHDSI community to start down the MySql path or have others been down that road only to find that ‘thar be dragons there’. Also, does MS Sql Server have good support for a free version?
MS SQL Server instances does have indices - they should be included in the MS SQL setup scripts when standing up an OMOP instance. Since you implemented a MySQL, perhaps like I did just wrote a bunch of CREATE TABLE statements in a script using the OHDSI data dictionary documents as the first step. Then added indices based on how I was using the database.
There are technical issues with MySQL as it doesn’t support certain features that ATLAS and other tools require to execute is the reason MySQL isn’t supported by the OMOP community.
Since I write a lot of the reporting and end-user applications in Java instead of going through the standard tools - the cost and easy deployment of MySQL made sense for my projects. Its not a good solution if you do need all the wonderful OHDSI tools.
@mkwong is correct, mySQL does not offer the required set of functions necessary for OHDSI analytics, so that’s why its not supported. It is not for any bias against the RDMBS nor other people’s interest (search the forum and you’ll see various folks talking about it). If at some future time, mySQL provides all necessary functions, then it would be possible for someone in the community to outfit the OHDSI tools to accommodate it, but I don’t expect that to be on anyone’s roadmap any time soon.
Thank you so much @mkwong and @Patrick_Ryan. You have been hugely helpful. I think I’m going to square off what I’m working on in MySql first and then port over to MS Sql Server.