Date, time and datetime problem and the world of hours and <1day

Vojtech_Huser · February 11, 2015, 12:48pm

The world of SQL dialects is messy and we chose a solution to separate dates and times to avoid some issues with SQL translation. And for 80% of analysis this will be fine.

Currently, the SqlRender package does not provide any translation for hours difference.
But for analyses that need this granularity, this is a future adoption problem.

To evaluate pleural effusion in your CDM based study that uses measurement table (CDM v5) - we need plural protein within 0.5 hours of serum protein. So we happen to hit the remaining 20% world

I would like to propose adding into SqlRender transation of

 tsql:datediff(hour,@start,@end)  into redshift:???
 tsql:datediff(hour,@start,@end)  into oracle:(@end - @start)
 tsql:datediff(hour,@start,@end)  into netezza:(@end - @start)

we would also have to handle dateadd(hour,....)

And also because things can happen around midnight and we may have to combine the date and time into datetime. (do what the CDM developers try to avoid, but it comes back… )
(or write separate SQL to first compare the date and then the time (plus around midnight logic)… (oh boy…)

I would like to get community opinion on this soft spot of "the temporal world of hours"
(It is most applicable to “EHR-ish CDM sites” rathe than claim-ish sites.)

Most likely in our final study, we will create a temp table in the parametized SQL that will have a datetime column something like "DATEFROMPARTS(measurement_date, measurement_time) as zzz_event_dt"
(and hope that all our data partners use the same dialect or are good at tweaking datetime SQL code)

I would like propose (mainly to SqlRender developers) to add some “limited support” for hours in the SqlRender - knowing that things can break (something like current RedShift support)

Patrick_Ryan · March 28, 2015, 3:43am

Just to rekindle this discussion:

George had a good idea to consider for a future CDM instance: for those
optional _TIME fields, why not make them formally datetime types, and
redundantly store the date with the time. That way, for those data
partners who have it, we could have a date type field and a datetime-typed
field, and we could support operations without issue. At present, the
_TIME fields are varchar and we cannot consistently enforce any operations
across database platforms.

I don’t have this type of data in any of my source data, so I’d prefer to
defer to the rest of the community who has an analytical use case for this
kind of data.

Cheers,

Patrick

hripcsa · March 28, 2015, 4:22am

Timestamps (date+time) become important for us when we are studying ICUs, for example. I would rarely use time by itself, but would instead have to always append it to date. Having both date and timestamp is redundant, I guess, but we also have era tables. Most of our source databases have timestamps (actual visit time, with diagnoses assigned the corresponding visit time), so separating them only to keep putting them back together is not helpful.

George

bailey · March 28, 2015, 4:24am

We’ve adopted this approach for use in PEDSnet, since we needed to retain the time semantics to order events, and we don’t have time-only data types in all DBMSs. There’s also not really much advantage for us in an isolated time; we need to reconnect it to the date to be able to measure intervals in any case.

Vojtech_Huser · April 1, 2015, 9:59pm

I like that there is new discussion about this.

I would like to second proposal from George and Charles. To change the current ‘time’ datatype to ‘datetime’.

I put a note about this into the measurement table.
(This is at the same time an attempt how commenting the CDM v5 could be done.) (wiki-based evolution of the specs)

(I put an additional sentence and a link into the time column in the specs)
http://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:measurement

The same would apply to several other tables.

Christian_Reich · April 1, 2015, 11:51pm

@Vojtech_Huser and friends:

Would it work if we changed all date fields to date/date-time? So, if you have time information and the right database technology you make it date-time, if not you just keep the date. As long as there is a way to subtract field 1 from field 2 and get back a number of days (that is the way all methods expect it right now) it would work. So, if field 1 and field 2 is the same day but in different time points a subtraction would have to result in fractions of a day. 1 hour would be 0.0417 days. Would that fly?

Patrick_Ryan · April 2, 2015, 12:16am

No, we can’t do that. Analyses expect dates only, and the various RDBMSs
do not consistently handle date arithmatic when theres also a time with the
date. That is why we need to keep them separate. I support making the
existing _TIME fields be of type ‘datetype’ and redundantly store the date
portion. That way, new methods can properly handle these extra optional
fields whenever this information is available, but not break all the
existing work that’s based on the required _DATE fields.

wstephens · April 2, 2015, 1:28pm

Agreed. We’ve already started this transformation on our local versions of v5. datetime also allows us to more easily generate timeline style graphs of measurement data.

Bill