Martijn,
Here are my random thoughts about the Common Data Model chapter. I’d love it to have more detail with regard to date and datetime treatment. I think it’s important to specify not just what the fields should contain, but to clarify, how various algorithms treat these fields.
-
The
observation_periodsays that theend_dateis the latest encounter on record, but it doesn’t expressly tell me that theend_dateis inclusive (is it?). How should condition or drug events limited to the observation period be pruned, must they have anend_date<=the observation period? These are essentially the same information, one specified normative form, the other specified operationally. -
For the
visit_occurrencetable,visit_end_dateseems to also be inclusive, saying that it “should match”. How are these date intervals treated by cohort definitions relative to periods and conditions, etc. -
For
condition_start_datethis is defined to be the condition recording date, which seems to be quite different from HL7 FHIR’s “onset” date. The cohort logic seem to treat thestart_dateas an onset, not the date of record. Further, this definition, one could easily have anstart_datethat is much later than theend_date. Is this intended or expected? Yet, thecondition_end_dateis when the condition is considered to have ended, which seems compatible with FHIR’s “abatement” date. -
For
drug_exposurethestart/enddate seems to be interpreted differently, exclusive rather than inclusive (or am I reading this wrong?). This seems to be touched on in a drug supply thread from 2015. Switching from inclusive treatment to exclusive interval endpoint is something that one could easily get wrong if one isn’t careful.
Generally, since the purpose of this system is to produce cohorts and higher-level analysis, really, the meaning of these fields is exactly how they are treated by the algorithms (which upon scanning the SQL. What would be most helpful is an exact description of how these columns affect the analysis performed. Or perhaps this belongs somewhere else?
There are also some missing details:
-
What is the relationship between
dateanddatetimeand how is a transition expected. It says that midnight is to be used for thestart_datetimewhen the exact time is unknown; but there is no corresponding statement forend_datetime. In some ways,end_datetimecan handle inclusive/exclusive differentiation, but a recommendation should be here… given that most treatment is specified to be inclusive, perhaps it should be defaulted to23:59.99when the exact ending time is unknown? -
How should a missing
end_datebe treated? Is it treated as unknown data? The generated SQL code usually treats aNULLvalue as being equivalent to thestart_date, or, in some cases for conditions, and without explanation, thestart_date+1(which seems to be a confusion of inclusive/exclusive interpretation).
Thanks for listening to random thoughts.