Correctly set up new CDMs & how to handle CDM updates

MichaelMartinec · September 2, 2022, 8:17am

Dear community, I have two use cases which I am trying to get my head around how to approach best. I’m hopeful other colleagues here have faced similar situations with ATLAS and multiple CDMs, updates to the CDMs and user permission.

Use case:

We receive a new data source mapped to OMOP CDM and we would like users with the right permission to be able to access it through ATLAS.
For an existing data source which is already available to users in ATLAS we have received an update of the data.

For both use cases what I am trying to get my head around is what steps are required and how should the database incl. schemas ideally be setup? E.g. do we need to create separate results and temp schemas for every update which are stored in the same schema of the first version? Or would it be enough to rerun Achilles every time an update comes in? I’ve browser through all the material and references starting with this page CDM Configuration · OHDSI/WebAPI Wiki · GitHub.

Also, depending on whether the user has requested access, has the appropriate rights and passed certain trainings on a specific data source the user gets access to the data in the schemas on redshift. This means that we can’t use a generic account for each data source in ATLAS and share that configuration with all the users. Each user has to create their own new source connection incl. their personal username and password. If a new user requests access to a DB and access is granted, do we then need to provide that user with read & WRITE access to the results and temp schemas?

Our current setup and my understand of the process is the following:
For 1)

New data source is transferred to a new schema on redshift by IT. The new schema contains the CDM. In our case the CDM schema and Vocabulary schema usually refer to to the same schema. The schema is read-only for the users who have permission to access the data.
Database Administrator creates two new schemas results and temp in the database. Results table is granted insert/delete/select/update rights for the results schema and full control (create/remove tables & data) for the temp schema to the users who have permission to access the data.
Database Administrator runs Achilles to characterise the CDM data which creates tables in the results schema.
4.Database Administrator generates and runs the SQL script for the CDM dialect to establish additional tables that WebAPI will require to run in the results schema.
The Database Administrator who has access to the WebAPI DB WebAPI database also populates the source and source_daimon tables with a JDBC source connection and corresponding source_daimon that specify the location for the cdm, vocabulary, results and temp schemas associated to the source.
The user configures the new source connection in ATLAS

For 2)
1a. Update is transferred to a new schema into the existing database of the original data

Here is where I stumble. Can the same results and temp schemas be used or will the Database Administrator have to create new results and temp schemas for the data update schema?? I.e. can the Database Administrator continue with simply rerunning Achilles using the existing results and temp schemas, followed by steps 4-6 above?

1b. Update is transferred to the existing schema into the existing database overwriting the original data. In this case the Database Administrator only has to rerun Achilles?

CarmenOT · September 2, 2022, 1:26pm

Dear @Chris_Knoll @alex-odysseus @ssuvorov-fls, could you please have a look into this? Or point it out to someone that can help? Thank you!

MaximMoinat · September 5, 2022, 6:21pm

Whether to keep multiple versions of the same database in your Atlas instance, is entirely up to you. This is how it is most often done as far as I have seen. And if you do, you should treat it as a completely independent source (new cdm and results schema, maybe even in an actual separate database).

Only read access on the cdm should be required, and depending on your use case also read access to the results schema. For using Atlas, the WebAPI role of course needs read access to the cdm and write access to the results schema (as described in the wiki link you shared, the same WebAPI role can be used for every new data source).

This is only true for the WebAPI role. Your users interact with this through the endpoints of the WebAPI (likely via the Atlas UI). Your users do need permissions on Atlas for the new data source.

Might be better if IT does this as well, once for all users. And grants permissions for the Atlas users (note: different from database users) to use the new data source.

In theory, yes. Old Achilles results will just be overridden. However, in my experience this leads to confusing situations. All existing results form e.g. cohort generations, will have a ‘cache’ of the last run on the old data. These results are invalid. So all analyses (cohorts, characterisations, etc.) should be rerun. This is a lot of work depending on your environment.

MichaelMartinec · September 9, 2022, 3:53pm

Thanks, this is helpful. To clarify:

Each of our users have permission to access and use specific databases (depending on training completed, compliance, site, etc). Also, we want to be able to log usage per user (able to identify which function/devision is using which data). From what you replied, the access is controlled by the data source configuration in ATLAS and each user would have his own config setup with his credentials for read access the cdm. The WebAPI role is the “generic” user which requires permission to read access to the cdm and write access to the results schema, not the user. The user only additionally requires read access to the results schema.

As mentioned above, not all users have permission to access all data sources. And the database manager does not have the password of the user to access the data source so likely the users would have to do it themselves in ATLAS?