I had always figured the .dbo. you have to add to the schema name in SQL Server was just an annoyance introduced by Microsoft just because they can, but as pointed out here, the dbo part is actually the schema name, and what I thought of as the schema name is in fact the database name. (To my defense, every DBMS has a different interpretation of what database, schema, user, and table means). As a consequence, people using a different schema name than dbo are out of luck when using any of our applications.
A solution I would like to suggest is to make dbo part of the schema name parameter. So where we previously had
SELECT * FROM @cdm_schema.dbo.person;
with parameter value cdm_schema = 'myCdmSchema' we will now have
SELECT * FROM @cdm_schema.person;
with parameter value cdm_schema = 'myCdmSchema.dbo'
To transition to this solution, I suggest we follow these steps:
Modify SqlRender’s translateSql() function so it generates a warning when your parameterized SQL still contains.dbo.
Modify all parameterized SQL in our applications to no longer use .dbo. (including ACHILLES, HERCULES, the methods library, and everything else)
After a grace period, remove the translation rule and warnings from SqlRender, meaning that everyone that hasn’t done step 2 will see their code break on any platform other than SQL Server
Hmmm, one thing I hadn’t thought of: the USE command only accepts the databasename, and we use it a lot. So if I say cdm_schema = 'myCdmDatabase.dbo' I can’t do
USE @cdm_schema;
Furthermore, if I have two schema’s in the same database I’m in trouble. For example, if I have both myCdmDatabase.dbo.person and myCdmDatabase.dbx.person, then what happens if I do
Maybe what is required is separate parameters for database and schema so that you can separate the two. Maybe with those 2 paramaters you can make a table prefix that concats the two together. and we can put into our scripts @schema where we need the schema and @database where we need the database. I assume that it’s always the case that we’d want to alias our tables as {database}.{schema].
The only question i have is, does the @schema contain the . at the end or not, so do you include it in the query as select * from @schema.tablename or select @ from @schematablename? I’m thinking that you wouldn’t include the tailing dot in the paramter and just assume that it contains a valid value such that you can do @schema.tablename.
My last thought is that maybe you can parse out the database part of a CDM_SCHEMA parameter by splitting on the ‘.’. You’d then inject the @database parameter implicitly and all of the scripts that need to use USE can use @database.
So, long story short: my opinion is that CDM_SCHEMA should be passed in as {server}.{schema} and in our scripts we refer to it as @CDM_SCHEMA.Tablename (I’m in agreement with you Martijn). This should work for all environments.
Ok, let me see if I understand you correctly. There would be two parameters to the script: @database_schema = 'myCdmDb.dbo' and @database = 'myCdmDb' (I’m using the SQL Server lingo here) which you could then use like this in paramterized SQL:
SELECT * FROM @database_schema.person;
USE @database;
You can make life easier for users by deriving the @database parameter from the @database_schema parameter automatically, for example by using the strsplit() function in R:
I believe SQL Server is the only database that uses the database.schema
syntax. All other systems use only ‘database’. So, I think @schuemie 's
original proposal, to embed the ‘.dbo’ in the database parameter seems to
make the most sense, since it accommodates SQL Server users to have
whatever database and schemas they want, but also allows Oracle/Postgres to
specify only the database for their needs. I think creating an extra
parameter and selectively ignoring it for select environments is
unnecessary.
(I brought Patrick up to speed, so ignore his previous post)
I’ve implemented the suggested solution in the CohortMethod package in this commit. This should give developers of other packages a good idea how to fix theirs. In summary (using cdmSchema as an example):
Rename parameters of the R function using parameterized SQL from cdmSchema to cdmDatabaseSchema
Change the help for the function to indicate that for SQL Server the cdmDatabaseSchema should include ‘.dbo’ at the end (or whatever the schema is).
Within the R function, add something like this: cdmDatabase <- strsplit(cdmDatabaseSchema ,"\\.")[[1]][1]
In the parameterized SQL, rename the @cdm_schema parameter to @cdm_database_schema, and remove any `.dbo’ you can find.
In the parameterized SQL, add a parameter @cdm_database, and use that when calling the USE function instead of @cdm_database_schema.
Don’t forget to pass the cdmDatabase parameter in R to the @cdm_database parameter in SQL when calling SqlRender.
I think the Java users of the SqlRender Jar are fine, because they were already doing something similar (at least in WebAPI). For now, nothing will change in the SqlRender package (although somewhere in the future we can drop the translation rules for ‘.dbo.’.)
I agree, this sounds like a good strategy to use moving forward.
I will take the action to go back and apply this convention to the
templateSQL for ACHILLES and HERACLES. @Chris_Knoll should bake this
approach into the alpha version of CIRCE that the cohort definition WG is
building. We do not need to revise the treatment pathways analysis, since
collaborators have been able to successfully run the program without this
modification.
Thank you @schuemie in particular as we are probably the one place with both Oracle and DB write restrictions. And thanks @Patrick_Ryan re: the updated Heracles SQL.
Hi @Vojtech_Huser! We’ve been using the following principle when it comes to documentation:
For R packages we use the standards for documenting those, so use in-line documentation (using Roxygen) to document functions which are collected in a package manual, and optionally vignettes (using KNITR) to describe how to use a package.
For everything else we use the wiki
For SqlRender there is a both a package manual and a vignette in the [Github repo] (https://github.com/OHDSI/SqlRender). I’ve just added a section on the whole database / schema discussion to the vignette.
I can confirm this: we don’t need to split out a ‘database’ value from the ‘database_schema’ parameter. We should assume that the table qualifier is completely stored in '*database_schema" (ie: cdm_database_schema or results_database_schema or vocab_database_schema" and it’s the responsibility of the caller (the person invoking the method) to put the appropriate ‘table qualifier’ in this parameter. The parameters used to store a ‘database’ value should be removed.