Friends:
This is a complex decision making. We have different timing precision in the data, and we have different use cases:
- Daily precision: This is the 99% predominant situation both in the data and use cases. Most queries contain some kind of date arithmetic, because in observational data the timing is what links events to each other within a Person. Slowing this down is what caused @schuemie to restart this debate.
- High precision (hours or minutes): Usually, only data collected from devices provide the precision, supporting use cases of acute (e.g. ICU) outcomes. Manually collected records, today the mainstay of observational data, rarely support this.
- Low precision (months, years or decades): This is the realm of “history of” and survey data, and the use cases are broadly defined inclusion or exclusion criteria (“no prior cancer”). This is actively debated here.
For representations, we have three choices:
- Date fields, like in V5. These support daily precision only and are the fastest solution.
- Datetime fields, like in V6. These support daily (albeit with a performance impact) and high precision.
- String fields with flexible notation or datetime fields combined with additional precision flags, like in @hripcsa’s, @dastumpf’s and @wjmcqueen’s references to how the literature, genealogy and HL7 CQL folks solve the problem, or @clairblacketer’s solution 1. These support any precision, but are slow or very slow and the arithmetic is complicated, particularly if involving different precisions.
We therefore have 2 decisions to make: (i) do we want one way to account for timing, or do we want to split the problem into the 99% straightforward daily and the 1% exceptions, and (ii) what representation do we want to pick?
V5 only supported daily precision. When we designed v6, we opted for a common representation (mandatory datetime) covering the daily and high precision cases, with a temporary backward compatibility (optional date). The low precision cases were relegated to “history of” Observations. We certainly could go back to the drawing board.
My preference was and still is date+flexible. It’s the fastest in most of the use cases, and for the rest there seems to be prior art of how to get the atypical precision cases right.