Hello OHDSI community,
I’m working with a relatively large MEASUREMENT table (approximately 500GB) in a PostgreSQL 15 environment,
and I’m exploring options to improve query performance and data management through partitioning.
However, I’m uncertain if partitioning is actually beneficial in this use case, especially considering that:
- Most research queries in OHDSI are concept-centric (i.e., filtering by
measurement_concept_id) - Partitioning by
person_idmay hurt performance due to wide data distribution - Date-based partitioning (e.g.,
measurement_date) may not help since many queries don’t rely on date ranges - List partitioning by “Top-N” concept_ids might be an option, but it’s hard to define boundaries up front
- Our team also uses Atlas, which dynamically builds SQL across domains — making static partitioning possibly ineffective
I wanted to ask:
- Has anyone successfully partitioned the
MEASUREMENTtable for performance? If so, what strategy worked best? - Does partitioning help with Atlas-based research workflows?
- Is there any known overhead or gotcha when partitioning large CDM tables in PostgreSQL?
- Would you recommend sticking to monolithic tables and optimizing via indexing and autovacuum settings?
Any experience, advice, or recommendations would be appreciated!
Thanks in advance,