I have an experience in connecting postgresql via Python during ETL processes. When we are uploading some big table or doing very long optimization queries (e.g. cluster on big tables), the connection would time out.
The way I managed to fix this (despite I have no idea why) is:
conn = psycopg2.connect(host=“someserver.hk”,
port=12345,
dbname=“ohdsi”,
user=“admin”,
password=“admin1”,
options="-c search_path="+schema,
# it seems the below lines are needed to keep the connection alive.
connect_timeout=10,
keepalives=1,
keepalives_idle=5,
keepalives_interval=2,
keepalives_count=5)
I totally do not understand why, but the keepalives lines seem to keep the connection online and the queries would be tolerated to run without a timeout.
Perhaps there is a underlying reason that there is a firewall between my workstation and the server. So if there is no response for too long something would forcefully close the connection.
Anyway I have no idea why it worked but it worked.
It comes the current problem now. We are told to use R to run a few scripts for the researchers. However, it seems like when the script is automatically running queries against some big tables, the connection would seem to timeout. I do not know about the reason since there is no error log whatsoever.
I can only see that the query is generally vanished from the psql monitor after 1 hour.
So my question is that, in the R package of DatabaseConnector::createConnectionDetails, I have read through the documentation and found out that there is a option of extraSettings
How can I put those python-like keepalives_*
settings in the DatabaseConnector::createConnectionDetails
script?
There seems to have no answers or similar knowledge online.
Thanks