OHDSI Home | Forums | Wiki | Github

SQL Lite and OHDSI tools

Dear Community,

Is there anything that cannot be done using SQL Lite? Is there any tool we are aware that will not run (e.g. ATLAS). I know the amazing work in Eunomia, and I wonder if there are any constraints in terms of SQL dialect that the community is aware of.

Thanks,

@schuemie, @Ajit_Londhe, @anthonysena, @Adam_Black, @Chris_Knoll

1 Like

I also like SQL Lite and plan to release (or aim for) MIMIC3 and MIMIC4 data just like Eunomia does. Eunomia is super nice! I am not aware of any limits.

In some my prior work - I noticed that concept queries take time. If you figure out indexing inside SQL Lite file, please post it to a github somewhere.

1 Like

hi @Vojtech_Huser,

I think this will copy the entire database with the index

1 Like

SQLite is pretty amazing indeed. There are some limitations for you to consider:

  1. SQLite is for single user only
  2. I don’t think it can support very large databases (e.g. > 1 TB)
  3. It doesn’t support DATE and DATETIME types and operations, although SqlRender is doing its best to make it look like it does. (This does require field names end in ‘_date’ and ‘_datetime’)

Furthermore, the current released version of RSQLite (2.2.0) contains an older version of SQLite which does not correctly support a specific windowing function used in all ATLAS-generated cohort definitions. So in order to run those you’ll need the develop version from GitHub.

2 Likes

hi @schuemie,

Thank you for your response. This is indeed great to know. Google failed me in replying this question :slight_smile:

Jose

This is not entirely correct:

  • SQLite is for single user only If you want high write concurrency, then I agree it is not a viable choice[1]. But it can have multiple read-only readers, even across processes[2]. So multiple users can be accessing a single, large Sqlite3 database if read-only. A viable scenario would be to put a large Sqlite3 database into shared cloud filesystem, like Amazon EFS, then multiple users on separate, appropriately sized compute instances (i.e. EC2) could mount the filesystem to do their read-only analysis.
  • I don’t think it can support very large databases (e.g. > 1 TB) Sqlite3 supports databases up to 281PB [3] and there many real world examples of production, multi-PB sized Sqlite3 databases[4].
  • It doesn’t support DATE and DATETIME types and operations, although SqlRender is doing its best to make it look like it does. (This does require field names end in ‘_date’ and ‘_datetime’) Sqlite3 supports DATE and DATETIME operations[5], but not types. This does not seem to be as large an issue as implied here. Many applications use these and can operate successfully with respect to time and datetime operations.

I believe that a Sqlite3 release would be useful, for the use case of read-only analysis and perhaps other use cases. Not having to set up a separate db server, and being able to copy of a database that resides in a single file, reduces the barriers to use OHDSI.

Thanks,
Glen

[1]www.sqlite.org/whentouse.html
[2]www.sqlite.org/lockingv3.html
[3]www.theregister. com/2020/08/19/sqlite_maximum_database_size_increased_281tb/
[4]www.reddit. com/r/sqlite/comments/euedur/sqlite_performance_on_large_databases/
[5]www.sqlite. org/lang_datefunc.html

1 Like

Thanks for the info @gnewton. Have you compared SQLite to DuckDB?

And welcome to the OHDSI forums. Glad to have you here!

@Adam_Black DuckDb is potentially a great solution for analytics purposes. My understanding is that SQLite is designed for transactional operations which are rarely (if ever) done when using a CDM. I looked, briefly, at supporting it in DatabaseConnector. It has full support for postgres syntax (including native DATETIME types). It also might bring some performance advantages for packages that use Andromeda storage like FeatureExtraction.

1 Like
t