Hi @krfeeney
This is a great question and one I have seen come up on the CDM github a couple times in the past weeks. @mgkahn’s point is well taken that this often has to be done at the discretion of the site and is often dependent on their infrastructure. At J&J, for instance, we have a PDW appliance that stores CDM tables in a clustered columnstore index and we apply some additional indices for vocabulary and some results tables based on our experience and needs.
What I have tried to do on the CDM github is provide a sample set of indices for each dbms that we support to offer some guidance for those folks who may be new to OHDSI. It can be difficult to know up front which indices you may need to enhance query performance if you are unfamiliar with the columns and how data are stored in the model. One of the problems I have run into is that it can be difficult to write an index for a dbms I do not have access to for testing. Right now I can test Oracle, Postgres, Sql Server using OHDSI resources and Redshift, and PDW using J&J resources. The two I do not have access to are Impala and Netezza and I rely on sql render to create these once I have tested and fixed syntax for the others.
With that being said, here is my suggestion: at the very least we know that columns containing concept ids and other ids (think person_id, visit_occurrence_id) need to be indexed along with some of the vocabulary tables. It has been an action item on my list for a while to address the indices available on the github so I can take a first pass but if there are any existing indices in use by the community that I can leverage, that would be great. I would also appreciate any volunteers to test the indices or review them for errors.
I could also see adding a article to the CDM website detailing which indices should be created and how they enhance performance. I am not a db admin so if anyone has the knowledge to provide some text on that, I think it would go a long way.