Book of OHDSI chapter review

Martijn,

Here are my random thoughts about the Common Data Model chapter. I’d love it to have more detail with regard to date and datetime treatment. I think it’s important to specify not just what the fields should contain, but to clarify, how various algorithms treat these fields.

  • The observation_period says that the end_date is the latest encounter on record, but it doesn’t expressly tell me that the end_date is inclusive (is it?). How should condition or drug events limited to the observation period be pruned, must they have an end_date <= the observation period? These are essentially the same information, one specified normative form, the other specified operationally.

  • For the visit_occurrence table, visit_end_date seems to also be inclusive, saying that it “should match”. How are these date intervals treated by cohort definitions relative to periods and conditions, etc.

  • For condition_start_date this is defined to be the condition recording date, which seems to be quite different from HL7 FHIR’s “onset” date. The cohort logic seem to treat the start_date as an onset, not the date of record. Further, this definition, one could easily have an start_date that is much later than the end_date. Is this intended or expected? Yet, the condition_end_date is when the condition is considered to have ended, which seems compatible with FHIR’s “abatement” date.

  • For drug_exposure the start/end date seems to be interpreted differently, exclusive rather than inclusive (or am I reading this wrong?). This seems to be touched on in a drug supply thread from 2015. Switching from inclusive treatment to exclusive interval endpoint is something that one could easily get wrong if one isn’t careful.

Generally, since the purpose of this system is to produce cohorts and higher-level analysis, really, the meaning of these fields is exactly how they are treated by the algorithms (which upon scanning the SQL. What would be most helpful is an exact description of how these columns affect the analysis performed. Or perhaps this belongs somewhere else?

There are also some missing details:

  • What is the relationship between date and datetime and how is a transition expected. It says that midnight is to be used for the start_datetime when the exact time is unknown; but there is no corresponding statement for end_datetime. In some ways, end_datetime can handle inclusive/exclusive differentiation, but a recommendation should be here… given that most treatment is specified to be inclusive, perhaps it should be defaulted to23:59.99 when the exact ending time is unknown?

  • How should a missing end_date be treated? Is it treated as unknown data? The generated SQL code usually treats a NULL value as being equivalent to the start_date, or, in some cases for conditions, and without explanation, the start_date+1 (which seems to be a confusion of inclusive/exclusive interpretation).

Thanks for listening to random thoughts.