OHDSI Home | Forums | Wiki | Github

Error implementing WebAPI in google cloude environment

Hi,
I am trying to run the CohortDiagnostics on a workspace that is installed on a Google Cloud environment that uses BigQuery. I got this error and I was hoping that someone ran into the same one:
Error in FALSE: Error executing SQL:
java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Not found: Dataset ProjectID:Schema_name was not found in location US
An error report has been created at Path_to_/errorReportSql.txt

I had to impute the name of project ID and schema name. I believe the error happens because the lowLevelExecuteSql adds the project name to the query and BigQuery does not allow it in the query. I think the error happens within lowLevelExecuteSql. I have been trying to debug and see where does the projectID get added but no luck.
The “executeSql”, “querySql”, and “renderTranslateQuerySql” is working.

Thanks,
Lina

Hi @LinaSulieman
This problem is probably related to the fact that the scheme is case-sensitive in BigQuery. Can you check that the schema for lowLevelExecuteSql function and schema in the database write the same way?

You can read more about this here: 7 "Gotchas" for Data Engineers New to Google BigQuery

Case Sensitivity – Unlike most RDBMS, BigQuery is case sensitive, not only for string comparison, but for object names as well. For string comparison, using The UPPER function on both sides (or one side of comparing to a literal) will eliminate the effect of case sensitivity. Curiously, although object names are case sensitive when referenced in a FROM clause, column names are not case sensitive.

1 Like

Hi @Yaroslav,
Thanks for replying. Yes, I had the case sensitivity issue and I already solved it. When I looked at the sql that is passed to the lowLevelExecuteSql, it has the correct name including the case of the letters. So the SQL in lowLevelExecuteSql and the database is the same. The only difference is that the SQL in lowLevelExecuteSql has the project ID in the SQL and I think bigquery does not allowed the project ID. I tried a query that has the projectID in querySql and I got the same error that the database does not exist. Once I removed the projectID, the query worked.

Do you know if or where lowLevelExecuteSql adds the project name? I tried to find it but I couldn’t

I don’t know a solution, but maybe @jposada has come across this.

@LinaSulieman just to clarify, why do you want to be able to specify a projectId or name in your SQL? Sounds like everything works if you do not specify it? Or - where is the project ID specified and being inserted into a SQL stmt?

@gregk , the query does not work once the project ID is added. The query only requires the database name. Once the project ID is added, the query won’t run and I get the error. I think there is line or a place where the low level SQL excecution adds the project name. I tried to locate that line but I couldn’t

are you specifying a project setting in your JDBC connection string? Could you please paste here your connection string (masking anything confidential)

Yes, I specify the project ID in the connection, otherwise the connection will fail.
I included the connection string and connection detail

jsonPath=“Path_to_Json/application_default_credentials.json”

connectionString ← createBQConnectionString(projectId = MyProjectID,
defaultDataset =DatasetName,
authType = 2,
jsonCredentialsPath = jsonPath)

connectionDetails ← DatabaseConnector::createConnectionDetails(dbms=“bigquery”,
connectionString=connectionString,
user="",
password=’’,
pathToDriver = bqDriverPath)

another check - when you created your data set, what location did you specify? Can it be stored in other than US location?

I don’t think the problem is not in the location since I was able to run the query using renderTranslateQuerySql to run different queries, and the cohortdiagnostics ran the part where it check the CDM vocab source.
Any other possible reasons?

let me check internally with our team again.

Maybe it would be easier just to do a quick interactive session to see how exactly you are executing this package and debug it all together - would you be open to that?

@gregk: an interactive session will be great. Thanks, what times you and your team will be available?

Lina

Hi Linda, I’m not sure about this function and how it creates the connection string. authType=2 requires different options to be passed. According to JDBC driver documentation, it should look like this:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;
OAuthType=2;ProjectId=MyBigQueryProject;
OAuthAccessToken=a25c7cfd36214f94a79d;OAuthRefreshToken=1jt9
Pcyq8pr3lvu143pfl4r86;OAuthClientId=11b5516f132211e6;OAuthCl
ientSecret=bCD+E1f2Gxhi3J4klmN;

In your case I see you have service account json file (?). It means that connection string should look like:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;
ProjectId=MyBigQueryProject;OAuthType=0;
OAuthServiceAcctEmail=bqtest1@data-drivertesting.iam.gserviceaccount.com;
OAuthPvtKeyPath=Path_to_Json/application_default_credentials.json;

I would recommend to set “connectionString” manually.

More information on AuthTypes you can find in JDBC driver documentation:

Any OAuthType other than 2 will generate an error. What do you mean setting the connectionString manually? I though you need a service key or json file? Also I think the connection words when we are using higher level query functions, it’s the low level ones that generate the error. Would the connection work for one type of querying and not the other?

hi @LinaSulieman ,

It looks like you are using this package?

It is a small package I wrote to help with the connection string creation using application default credentials

The error you may have is that when specifying a parameter such as cdmSchema you need backticks. Below a code that I always use to test if my connection is correct before running any network study

bqDriverPath <- bq_driver_folder
jsonPath <-paste0(gcloud_credentials_path_docker, "/application_default_credentials.json")

database_description_file <- 'starr_omop_description.txt'
work_project_id <- "som-nero-phi-rit"
work_dataset_id <- "mskai_ple_sample"
temp_dataset_id <- "mskai_ple_sample_temp"
cdm_project_id <- "som-rit-phi-starr-prod"
cdm_dataset_id <- "starr_omop_cdm5_deid_1pcent_latest"

cdmDatabaseSchema <- paste0("`", cdm_project_id, ".", cdm_dataset_id, "`")
cohortDatabaseSchema <- paste0("`", work_project_id, ".", work_dataset_id, "`")
cohortTable <- "mskaicohort"
minCellCount <- 10
# Database description
databaseId <- "STARROMOP"
databaseName <- "STAnford medicine Research data Repository (STARR-OMOP)"
databaseDescription <- readr::read_file(database_description_file)

connectionString <-  BQJdbcConnectionStringR::createBQConnectionString(
                                                                       projectId=work_project_id,
                                                                       defaultDataset = temp_dataset_id,
                                                                       timeOut = 3600,
                                                                       authType = 2,
                                                                       jsonCredentialsPath = jsonPath)


# testing the BQ connection
connectionDetails <- DatabaseConnector::createConnectionDetails(dbms="bigquery",
                                                                connectionString=connectionString,
                                                                user="",
                                                                password='',
                                                                pathToDriver = bqDriverPath)
connection <- DatabaseConnector::connect(connectionDetails)
sql <- "
SELECT * 
FROM
 {cdmDatabaseSchema}.location
LIMIT 2;"

sql <- glue::glue(sql,
                  cdmDatabaseSchema = cdmDatabaseSchema)

sql <- toString(sql)
print(sql)
df <- DatabaseConnector::querySql(connection, sql)
print(df)
DatabaseConnector::disconnect(connection)

I have a Youtube video that walks over the use of this in the context of Docker, but you can omit all the docker stuff and jump to the part where I explain the config part

@jposada Yes, this s what I am using. The connection was successful and I tested whether I am able to run a query and I was able to run a query using DatabaseConnector::querySql successfully. When I ran the cohortDiagnostic, that’s when I encountered the error.
So When I ran the runCohortDiagnostics function, I got the error below. I debugged which query it caused it and it was the queries for dropping and creating concepts tables that will be used to create cohort definition.

Found cohortSetReference. Cohort Diagnostics is running in WebApi mode.
Run Cohort Diagnostics started at 2021-07-14 22:11:47
WebApi of version 2.9.0 found at http://api.ohdsi.org:8080/WebAPI
Retrieving cohort definitions from WebAPI

  • Retrieving definitions for cohort Vinci Type 2 Diabetes
    Number of cohorts 1
    Saving database metadata
    Saving database metadata took 0.00305 secs
    Error in FALSE: Error executing SQL:---------
    java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Not found: Dataset projectID:defaultDataset was not found in location US
    An error report has been created at X

Hi @LinaSulieman,

It is great that you were able to query. I have not used Cohort Diagnostics with the WebApi maybe @Adam_Black or @Konstantin_Yaroshove could further help.

Thanks @jposada , I use the solution that you posted as an answer to @cukarthik, and it worked like a charm. Thanks again

1 Like

@Adam_Black , @Konstantin_Yaroshove to catch you up on the problem without having to go through all the posts, I get the following error when I run runCohortDiagnostics function. I already checked if DatabaseConnector::querySql is working, and it runs without any problem. After some digging, the error happens when the function try to execute drop query that is part of drop and insert SQL statement batch to create a table for concepts in the cohort definition. The results and error when I run runCohortDiagnostics is :
Found cohortSetReference. Cohort Diagnostics is running in WebApi mode.
Run Cohort Diagnostics started at 2021-07-14 22:11:47
WebApi of version 2.9.0 found at http://api.ohdsi.org:8080/WebAPI
Retrieving cohort definitions from WebAPI

  • Retrieving definitions for cohort Vinci Type 2 Diabetes
    Number of cohorts 1
    Saving database metadata
    Saving database metadata took 0.00305 secs
    Error in FALSE: Error executing SQL:---------
    java.sql.SQLException: [Simba]BigQueryJDBCDriver Error executing query job. Message: Not found: Dataset projectID:defaultDataset was not found in location US
    An error report has been created at X
1 Like
t