@Patrick_Ryan Thatās fine with me. Just to articulate the other side of @hripcsa 's #1: if we were looking for minimal change, we could solve the analytic problems with two changes:
- Add a
birth_date
field to person
., with appropriate conventions for defaulting (e.g. if you only know year of birth, store year-01-01 or somesuch.
- Change the semantics of all existing
*_date
from DATE to DATETIME.
The upsides of this approach are that weād get efficient calculation of intervals, sub-date precision when we needed it, and no added space consumption for the (common?) case that weāre just dealing with dates. The practical downside is that existing code that relies on a_date - b_date
yielding an integer will break, since most RDBMS Iāve encountered use some interval type for the difference in datetimes, which may or may not (looking at you, Postgres) quietly cast to an integer when needed. The informational downside is that we wonāt know what the precision of dates are within the data model. (Thereās also the fact that DATETIME types have a shorter span of actual dates they can cover than a DATE type, but I donāt see that as a problem for OHDSI; by the time itās a problem native types will probably be wider. )
When we extended the CDM for PEDSnet, we went with a separate set of *_time
fields because we thought that the change in date math would break code that assumed it was dealing with DATE types. Thatās still a real consideration, but I can see an argument that the pain of a flag day now is worth the gain of having a single place for temporal information going forward
To be clear about @rimma 's proposal, the *_datetime
fields will have datetime semantics, so you donāt actually have to consult the precision field to do date math; itās just there as advisory metadata if you want it. By biggest reservation is that it consumes row-level storage for something thatās likely constant across large swathes of data, so may be more appropriate for a metadata table. The question is how often youād want it while doing, versus how often you want to record it so the user will know about it.
To @hripcsa 's other points:
- (markdown bait)
- Agree
- Iām not as worried about
_dt
. I agree itās not as clear on first reading, but itās fine once you know it, so weāre talking about a one-time entry cost. And itās a much lower cognitive barrier than x_concept_id
vs x_type_concept_id
vs x_source_concept_id
, for instance.
- I like the idea of deprecating and eventually removing the
_date
fields, since it sounds like the only reason to separate them is backwards compatibility. (Well, we also need to teach SqlRender when to use extract(days from a - b)
or similar, but thatās manageable. I do wonder whether the precision fields might be candidates for extending cdm_version
, though thatās got its own set of edge cases. Minimally, it gets us to something like PCORnetās āharvestā table, which looks like a lot of bookkeeping, though itās only once per instance, and if most of itās optional, can be used only when needed.