Seeking Guidance on Partitioning the MEASUREMENT Table in PostgreSQL CDM (v15)

Zion_Jung · July 30, 2025, 1:44am

Hello OHDSI community,

I’m working with a relatively large MEASUREMENT table (approximately 500GB) in a PostgreSQL 15 environment,
and I’m exploring options to improve query performance and data management through partitioning.

However, I’m uncertain if partitioning is actually beneficial in this use case, especially considering that:

Most research queries in OHDSI are concept-centric (i.e., filtering by measurement_concept_id)
Partitioning by person_id may hurt performance due to wide data distribution
Date-based partitioning (e.g., measurement_date) may not help since many queries don’t rely on date ranges
List partitioning by “Top-N” concept_ids might be an option, but it’s hard to define boundaries up front
Our team also uses Atlas, which dynamically builds SQL across domains — making static partitioning possibly ineffective

I wanted to ask:

Has anyone successfully partitioned the MEASUREMENT table for performance? If so, what strategy worked best?
Does partitioning help with Atlas-based research workflows?
Is there any known overhead or gotcha when partitioning large CDM tables in PostgreSQL?
Would you recommend sticking to monolithic tables and optimizing via indexing and autovacuum settings?

Any experience, advice, or recommendations would be appreciated!

Thanks in advance,