OHDSI Home | Forums | Wiki | Github

Broadsea installation using BigQuery

Great news, I have Atlas now up and running using the BigQuery drivers mentioned in the previous messages. However, although some data is shown now, I’m still not able to show all reports. The WebAPI logs mention that the results.achilles_analysis table does not exists, which is right.

Where should this table be created? I did to following things to create tables:

  1. Run these DDL scripts: CommonDataModel/inst/ddl/5.4/bigquery at main · OHDSI/CommonDataModel · GitHub

  2. Run the scripts retrieved from http://host:port/WebAPI/ddl/results?dialect=bigquery&schema=<results_schema>&vocabSchema=<vocab_schema>&tempSchema=<temp_schema>&initConceptHierarchy=true

  3. Run the scripts generated with Achilles.

library(Achilles)

connectionDetails <- createConnectionDetails(
  dbms = "bigquery", 
  connectionString = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<project>;OAuthType=0;OAuthServiceAcctEmail=<service_account>;OAuthPvtKeyPath=<key_path>;",
  user = "",
  password = "",
  pathToDriver = "<driver_path>"
)

achilles(connectionDetails, 
         cdmDatabaseSchema = "<cdm_schema>", 
         resultsDatabaseSchema = "<results_schema>",
         scratchDatabaseSchema = "<temp_schema>",
         vocabDatabaseSchema = "<vocab_schema>",
         sourceName = "BigQuery",
         outputFolder = "output",
         sqlOnly = TRUE
)

I assumed this last step was were the results.achilles_analysis should be created, but this isn’t the case. Is there any parameter I am missing, do I need to use another function, or…? Any help is appreciated.

I would have expected that the achilles_analysis table would have been created during the achilles analysis. @AnthonyMolinaro , would you have any info on this?

I’m digging through search results on seaching for achilles_analysis in the OHDSI repo, found this reference, so I would think that this table would have been created in Achilles.

Also, I found reference to creating this table in this PR, but since that release, iI’m not sure how the code has changed to create the analysis SQL and create the results tables. It looks like the analysis sql was split into individual files, and the create table statements were split out as well. I just can’t see where they are being invoked.

Update: I think i found it.

The default params to achilles sets CreateTable = TRUE, so I’m not sure why that table was not created.

I think I know, because I run the achilles() function with parameter sqlOnly = TRUE. Since there are still some bugs in the generated BigQuery SQL, I had to run the queries manually and fix the bugs while I encounter them.

However, where other statements are appended to the achillesSql object, the query to generate the achilles_analysis is not. For this query a create table statement is executed when sqlOnly is set to FALSE. The data is then inserted directly from a csv file, which does not happen in sqlOnly mode.

Ok, maybe on option is to run achilles on a sample CDM dataset (or an empty cdm schema) just to get the achilles_analysis table, and then you can copy it over to a databricks instance?

Thanks! For now I solved it by running the R code manually. I then uploaded the resulting csv file to Google Cloud Storage and inserted it in the achilles_analysis table in BigQuery. It seems Atlas is now working correctly, and I can see all BigQuery data.

Follow up on this post. At the moment there are two issues which I can’t figure out.

  1. When try to count the occurrence of concept sets within my data source, I get the following error message: Unrecognized name: person_count at [1:59]. This seems to occur in the following query:
select concept_id, record_count, descendant_record_count, person_count, descendant_person_count
from results.achilles_result_concept_count
where concept_id in (?,?,?,?);

However, the achilles_result_concept_count table doesn’t have a person_count column. This table was created using the achilles() function in the Achilles package, but the source code also doesn’t suggest this table should contain the column mentioned: https://github.com/OHDSI/Achilles/blob/main/inst/sql/sql_server/analyses/create_result_concept_table.sql. Am I missing something?

  1. I’m not able to query the BigQuery database from Hades using the DataConnector package. This is what I’m trying:
library(DatabaseConnector)
library(SqlRender)

connectionDetails <- createConnectionDetails(
  dbms = "bigquery", 
  connectionString = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<project>;OAuthType=0;OAuthServiceAcctEmail=<service_account>;OAuthPvtKeyPath=<key_path>;",
  user = "",
  password = "",
  pathToDriver = "<driver_path>"
)
connection <- connect(connectionDetails)

care_site <- querySql(
  connection,
  sql = "SELECT * FROM care_site",
  errorReportFile = "errorReport.txt"
)

I seem to be able to establish a connection, which also shows up under the RStudio database connections. However, when sending a query I get the following error report:

DBMS:
bigquery

Error:
java.sql.SQLException: [Simba][BigQueryJDBCDriver](100032) Error executing query job. Message: EXEC_JOB_EXECUTION_ERR

SQL:
SELECT * FROM care_site

R version:
R version 4.2.1 (2022-06-23)

Platform:
x86_64-pc-linux-gnu

Attached base packages:
- stats
- graphics
- grDevices
- utils
- datasets
- methods
- base

Other attached packages:
- SqlRender (1.13.1)
- bigrquery (1.4.1)
- Achilles (1.7)
- DatabaseConnector (6.1.0)

When I try to query the database with the packages bigrquery and DBI, I am getting results, so the connection doesn’t seem to be the issue. Am I doing something wrong, or are there any known issues with the DataConnector and SqlRender packages?

Ok, this could be an oops on my part: This commit made the achilles_result_concept_count a required table, and I didn’t know it was an Achilles table in the first place! I thought it was a WebAPI specific optimization table to collect record counts from the Achilles results which is very slow to fetch ‘on the fly’. This is the DDL that builds the table for WebAPI. You can overwrite your achilles_result_concept_count table with this, and I think we should think about either removing that table creation from achilles, or just update the Achilles process to use this new code.

Thanks! This seems to have worked. FYI, the demo dataset in the Broadsea image also contains this same bug.

Regarding the Hades connection issue I also found a solution. I was using the newest (1.3.0 1001) version of the JDBC driver, but this one seems to be incompatible with DataConnector/SqlRender. When using the 1.2.25.1029 version of the driver, I am able to execute BigQuery queries.

Follow up on this one. I’m trying to set up a new WebAPI 2.13.0 instance using exactly the same settings as before. However, my container keeps restarting every few seconds and I cannot really figure out what goes wrong. The only thing I can find in the docker logs is:

org.flywaydb.core.internal.dbsupport.FlywaySqlException: Unable to obtain Jdbc connection from DataSource

SQL State : 28P01
Error Code : 0
Message : FATAL: password authentication failed for user "postgres"

However, I did not change anything to the user or password. Does this tell you anything? Probably I forgot something, but I don’t have a clue at the moment.

When loading Atlas this causes a 502 error: Failed to load resource: the server responded with a status of 502 (Bad Gateway)

I can say that Flyway ensures that the database is in correct working order, and if the logs report that it was unable to authenticate to postgres database, the process will shut down.

Which version of Broadsea are you using?

I’m still using Broadsea 2.0.0, since I got this to work before. I tried 3.0.0 also, but without any success yet. Does the error message tell you anything?

The problem definitely seems to be in the WebAPI. When I just take the latest version, add the default settings to WebAPIConfig and build/run it using the webapi-postgresql profile, it almost crashes instantly. Could it be that something changed in one of its dependencies? Because it used to work fine.

So this postgres account, you can confirm it works otherwise? E.g. in a JDBC application like pgAdmin or SqlWorkbenchJ (or even R)?

I’m not sure whether I’m able to check this. It is the default postgres user on the default WebAPI postgres database. I assume the WebAPI needs to be running in order for me to check this? Also I’m not sure whether the problem is in the user, or in the JDBC driver.

This is independent of WebAPI running. The Postgres database via the “atlasdb” container should be up and running, is it?

If it is running, next you’d need to verify that the connection details you’ve provided to the webapi container are valid. If running on your localhost, try initiating a JDBC connection to it. With R, you can use this:

connectionDetails <- DatabaseConnector::createConnectionDetails(server = "127.0.0.1/postgres",
                                                                    dbms = "postgresql",
                                                                    user = "<your user name>",
                                                                    password = "<your password>",
                                                                    pathToDriver = "<path to your postgres jdbc driver")
connection <- DatabaseConnector::connect(connectionDetails = connectionDetails)

Next, see that your webapi schema is available.

The atlasdb container is up and running. However, I cannot from localhost using the default username and password. For Broadsea 3.0 these should match the WEBAPI_DATASOURCE_USERNAME and WEBAPI_DATASOURCE_PASSWORD in the .env file right? In R I get the following error:

Error in rJava::.jcall(jdbcDriver, "Ljava/sql/Connection;", "connect",  : 
  org.postgresql.util.PSQLException: Connection to 127.0.0.1:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.

Also when I check the WebAPI logs, I now notice the following error:

ERROR main org.springframework.boot.SpringApplication - [] - Application startup failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'oidcConfCreator': Injection of autowired dependencies failed; nested exception is java.lang.IllegalArgumentException: Circular placeholder reference 'security.oid.customParams' in property definitions

Edit: I’m able to connect to the database directly from within the atlasdb container, using: psql --host localhost --username postgres --password

It should be the same credentials, yes. So when you say within the container, you mean that using docker exec or using the terminal provided by the Docker GUI for that container, right?

From outside of the container, from your actual local machine, you cannot get the psql command to run? Do you have any firewall settings enabled that prevent port 5432?

That circular reference is worrying, we just added a PR that handles custom params, so I’m hoping we didn’t introduce an issue with that.

t