DatabaseConnector question: insertTable vs DBI::dbWriteTable (on postgres)

Vojtech_Huser · October 4, 2019, 1:56pm

Martijn or any other expert, @schuemie

on postgress we have a medium sized table (86k rows and 1M rows) and the insertTable method is very very slow.

I studied the code for the method and it seems to use DBI::dbWriteTable (but I could not decipher the methods you use.

For medium sized table, can we use dbWriteTable directly. Can you please describe in few sentences what approach insertTable takes and how to best use it. (for medium sized tables)

Mark · October 4, 2019, 2:42pm

I am not an expert in R but in my previous life I was a data architect and APi designer using Postgres and Java. The one thing I see in the R Code that is a dead giveaway to a performance hit is the default dropTableIfExists = TRUE in the function call. By dropping and building a new table in its place, you are hitting the indices very hard. There is a possibility that there can be performance gains by modifying the settings on the Auto Vacuum and/or the Write Ahead Log.

This post is not meant to be an answer to your question; just to point out the most likely issue with the performance hit.

schuemie · October 4, 2019, 2:57pm

insertTable uses batched queries through JDBC (Java) (see here, which calls the Java code here). In my experience, uploading data this way to Postgres is lightning fast, and a 1M row table should take a few seconds at most.

If you have another DBI driver for Postgres you’re welcome to try that one, but my suspicion is that it’s not the client, but that something on the server side is slowing you down.

(Of course I’m assuming you have the latest version of DatabaseConnector. Batched inserts were introduced in version 2.1.4 in July 2018)

Vojtech_Huser · October 7, 2019, 6:16pm

We have no indexes defined. We first drop the table and then insert data (with no index burden). The version is recent.

I will try the flightinfo data and post a reproducible example. It could be our server but the machine is large server.