OHDSI - A Single User or Multi-User Platform?

jon_duke · January 19, 2015, 5:07pm

Hi all,

I need a bit of advice.

To support the needs of our Regenstrief data core, we are working on an environment that wraps a few OHDSI pieces together (WebAPI, Hermes, Achilles, etc). When done, this will of course be released to the full community. In creating this, we have hit upon a question for which it would be useful to get outside perspective. The question is whether people would prefer an OHDSI platform as a single-user or multi-user experience. Let me summarize the differences:

Single User - You download an install a software package. You configure it with your database credentials and the locations of your CDM schema and results schema. From there you can launch various web-based OHDSI tools (such as Hermes, Achilles, etc).

If a colleague in your organization wants to use it too, they would install the software on their own machine. They will need their own database credentials (unless you are willing / able to share). You would also need to coordinate use of the same results_schema so each person doesn’t have to run Achilles individually.

Alternatively, you could choose to expose your install of this platform to everyone on your network. Everyone would be working through the configuration details you set up, but would also have all the “rights” that you have, in terms of changing configurations, etc. So should you have ne’er do wells in your environment (or the accident-prone), they could mess up your configuration or run things you don’t want them to (e.g, create a 20 million person cohort 50 times). But if you have no such concerns in your environment (or don’t expect anyone else will be using it), then shouldn’t be a problem.

Multiple User - In this scenario, there is a “admin” type user who sets up the initial configuration and grants permissions to other users. Each has a user name and password. Admin users can do anything including setting the CDM configuration. Regular users could be allowed (for instance) to build cohorts, run queries, and view results. While restricted users could only view results. Everyone in your environment would be using the same setup and would not need individual database accounts, but you would have to “grant” them accounts on this OHDSI platform.

The downsides of this multi-user scenario is mainly that it is more complex to develop, and is only beneficial if you want to secure / constrain use of your CDM environment to certain users. It would secondarily be helpful for stakeholders who don’t have much technical skill, letting them experience OHDSI wonders without doing anything more than logging in to something.

My own feelings about this are kind of shifting, and I’d love to get thoughts from others.

Thanks,

Jon

nigam · January 19, 2015, 10:50pm

Given the stated mission, I feel that multiple user is the way to go. If we want doctors, and patients using it at some point, then single user won’t fly.

Frank · January 19, 2015, 11:46pm

I’m not sure that this is fundamentally about multi-user vs single-user. This is really a discussion on roles and permissions.

As you pointed out, installing the software and exposing it on the network makes it multi-user. This is the case at Janssen where the websites for Achilles and Hermes are available on our intranet and all users within the enterprise can access them. Things like running the Achilles R package and generating the results for a CDM are only done once for everyone in the enterprise.

So I would suggest we reframe the discussion. Should OHDSI applications be user aware and have a role based security system? With logins for users and a role based security system you could provide the types of features you described above where only certain users would have permission to access certain features, configurations, cohort execution, etc.

Today we are only partially capable of achieving this type of security with our applications. In our environment we set permissions on web sites at the web server layer so only authorized people can access certain applications. While it suits our immediate needs I can definitely see reasons why a more fine grained security model would give us more control over how our applications could be used.

If you think it makes sense I’ll add a discussion of a security model / user profile layer to the next architecture call.

jon_duke · January 20, 2015, 2:54am

Thank you @Frank, that is a good additional framing of the question. Yes, it does come down to whether or not we want our applications to have a user context. Does access to the CDM need to be logged or monitored in any way? Do we need a ‘view-only’ type user?

Certainly worth earmarking for our architecture call. But perhaps step one is just gathering the demand. If the need is not there, then I certainly don’t want to push for unnecessary functionality.

lee_evans · January 22, 2015, 4:32pm

It may be useful to consider the database/schema requirements when thinking about this question.

Here are my suggestions.

A read-only CDM database/schema shared across all users. There may be multiple CDM’s, one per data source.
Individual ‘sandbox’ database/schema for each user to extract data from the CDM and perform individual analysis using SQL / R etc. This is an informal agile working area for an individual user.
Read-write project/study databases/schemas shared across small groups of users collaborating together on an analysis project/study using SQL / R etc. More formal than a sandbox and archived for future reference. These databases/schemas could perhaps be used for OHDSI network studies too.
OHDSI tool database/schema - ‘OHDSI’ schema (AKA results schema). This can be shared for read-only tools e.g. Hermes, AchillesWeb. Will future OHDSI tools need to support user specific data? e.g. individually defined cohorts? If so the tools will need to maintain userid as part of the key on persisted data if this database/schema is shared across users. A shared tool database may have potential for performance bottlenecks if multiple users are running more complex OHDSI analytics tools on it at the same time. Alternatively the tools database/schema could be created per user/project group and the tools would be configured with the tools database/schema to use per user/project group. The tools database/schema could perhaps be combined with the project/study database/schema but there might be potential for users to manually drop/update the OHDSI tools results data tables in error.

In addition, consider also the potential to use ‘cloud computing’ - virtual machines and/or Docker containers for application isolation and dynamic deployment. Note. You can deploy an internal cloud/cluster, if external cloud data security is a concern.

For example I am currently running the public OHDSI apps achilles, hermes etc in Docker containers in AWS. I don’t think it would be much of a stretch to dynamically spin up a new set of OHDSI apps in Docker containers for an individual user to use on demand (with on-demand sandbox database and results database) and remove the containers and on-demand database(s) when they are no longer needed. The VM and/or Docker containers could be configured per user or user group to reference the necessary databases/schemas.