Martijn,
Here are my random thoughts about the Common Data Model chapter. I’d love it to have more detail with regard to date
and datetime
treatment. I think it’s important to specify not just what the fields should contain, but to clarify, how various algorithms treat these fields.
-
The
observation_period
says that theend_date
is the latest encounter on record, but it doesn’t expressly tell me that theend_date
is inclusive (is it?). How should condition or drug events limited to the observation period be pruned, must they have anend_date
<=
the observation period? These are essentially the same information, one specified normative form, the other specified operationally. -
For the
visit_occurrence
table,visit_end_date
seems to also be inclusive, saying that it “should match”. How are these date intervals treated by cohort definitions relative to periods and conditions, etc. -
For
condition_start_date
this is defined to be the condition recording date, which seems to be quite different from HL7 FHIR’s “onset” date. The cohort logic seem to treat thestart_date
as an onset, not the date of record. Further, this definition, one could easily have anstart_date
that is much later than theend_date
. Is this intended or expected? Yet, thecondition_end_date
is when the condition is considered to have ended, which seems compatible with FHIR’s “abatement” date. -
For
drug_exposure
thestart
/end
date seems to be interpreted differently, exclusive rather than inclusive (or am I reading this wrong?). This seems to be touched on in a drug supply thread from 2015. Switching from inclusive treatment to exclusive interval endpoint is something that one could easily get wrong if one isn’t careful.
Generally, since the purpose of this system is to produce cohorts and higher-level analysis, really, the meaning of these fields is exactly how they are treated by the algorithms (which upon scanning the SQL. What would be most helpful is an exact description of how these columns affect the analysis performed. Or perhaps this belongs somewhere else?
There are also some missing details:
-
What is the relationship between
date
anddatetime
and how is a transition expected. It says that midnight is to be used for thestart_datetime
when the exact time is unknown; but there is no corresponding statement forend_datetime
. In some ways,end_datetime
can handle inclusive/exclusive differentiation, but a recommendation should be here… given that most treatment is specified to be inclusive, perhaps it should be defaulted to23:59.99
when the exact ending time is unknown? -
How should a missing
end_date
be treated? Is it treated as unknown data? The generated SQL code usually treats aNULL
value as being equivalent to thestart_date
, or, in some cases for conditions, and without explanation, thestart_date+1
(which seems to be a confusion of inclusive/exclusive interpretation).
Thanks for listening to random thoughts.