Since we are using OHDSI naming convention, we are okay w/ the tables and columns, but for our schema (i.e. project space) it is case sensitive…
filenames for tables on Unix file systems?
It is common for us, that build file structures in *nix systems, to only use lowercase directory/file names and use underscores in place of whitespace, for this exact reason.
And after you get used to it, it’s hard to try something else.
I know this is an ANCIENT issue but there now is an option that makes BQ case insensitive:
alter schema project_id.dataset
set options (is_case_insensitive = TRUE);
It has mildly adjusted my love-hate relationship with BigQuery…
Hi Michael,
Thank you for this tip. I’m trying to follow your instructions. But since I’m not used to work on google cloud platform where my database biquery is hosted. I don’t know where to run this code.
On google cloud I don’t have the permissions to modify the database. Is there a way to run this code in my R environment ?
Thanks,
@MeriemGrs If you do not have permissions, the only option is to request this change internally. I assume your IT folks could help you.
Ok than I will do that. Thanks Konstantin !
@MeriemGrs: I am not sure what permissions are required to allow you to execute the “alter schema…” command. I executed that statement only once as a stand-alone command in the Google BigQuery SQL console. If I understand your post, you tried this (or something similar) and had the command rejected. I am pretty sure putting it into an R script (for example, adding it to the SQL statements in R) would result in the same rejection. I agree with @Konstantin_Yaroshove that if executing the commend in the Google console is rejected for you, your only option is to ask the BigQuery owners to change it for you. It is a one-time request but (unfortunately) for each projectid.dataset.
Your IT folks should not give you any push-back as it should not evoke any additional security concerns. If they refuse, your only remaining approach is to manually change the table names in all SQL scripts that you receive into the upper/lower case that you use. Prior to discovering the new command, I wrote a simple SED script that did this for me for all of the OMOP tables.
Seems I cannot post files so here is my TRIVIAL script that converts OMOP table names that are in uppercase to lowercase because our OMOP tables are in lowercase. It assumes that the original table names are prepended with “Schema” (as is commonly used as “@cdmSchema” or “@resultsSchema”. It also assumes the three files ORIGINAL_bigquery.sql, lower_case_OMOP_tablenames.sh, and lc_tables.sed, are in the same directory. There are more sophisticated scripts that can do much more than this. I only created this utility for my personal use so I wasn’t fancy about it…
To be clear, after I was able to run the “update schema…” command, I no longer need to run this SED script.
lower_case_OMOP_tablenames.sh:
#!/bin/bash
sed -f ./lc_tables.sed ORIGINAL_bigquery.sql > ORIGINAL_bigquery_lc.sql
lc_tables.sed:
s/Schema.PERSON/Schema.person/g
s/Schema.OBSERVATION_PERIOD/Schema.observation_period/g
s/Schema.VISIT_OCCURRENCE/Schema.visit_occurrence/g
s/Schema.CONDITION_OCCURRENCE/Schema.condition_occurrence/g
s/Schema.DRUG_EXPOSURE/Schema.drug_exposure/g
s/Schema.PROCEDURE_OCCURRENCE/Schema.procedure_occurrence/g
s/Schema.MEASUREMENT/Schema.measurement/g
s/Schema.OBSERVATION/Schema.observation/g
s/Schema.LOCATION/Schema.location/g
s/Schema.CARE_SITE/Schema.care_site/g
s/Schema.PROVIDER/Schema.provider/g
s/Schema.DEATH/Schema.death/g
s/Schema.DRUG_ERA/Schema.drug_era/g
s/Schema.CONDITION_ERA/Schema.condition_era/g
s/Schema.OBSERVATION_PERIOD/Schema.observation_period/g
s/Schema.DRUG_ERA/Schema.drug_era/g
s/Schema.CONDITION_ERA/Schema.condition_era/g
Thank you all for your answers and explanations.
I managed to run the “alter schema” and that was very helpfull.