OHDSI Home | Forums | Wiki | Github

Should _datetime fields be required?

During implementation of the Oracle OMOP CDM we ran into the issue of many of the _datetime fields being set to NOT NULL. I checked several of the other database flavors and they too had them created as not null. The standard in git has these fields defined as not required. My question is if the DDL needs to be modified to change these to optional or if the standard needs to be modified to reflect that these fields are required?

The first two for observation_period may be correct. The main issue with them is that they are not even included in the documentation on git. I believe they were added in 5.2. Perhaps the documentation is behind?

Here are the fields in question.

observation_period.observation_period_start_datetime
observation_period.observation_period_end_datetime
procedure_occurrence.procedure_datetime
drug_exposure.drug_exposure_start_datetime
device_exposure.device_exposure_start_datetime
condition_occurrence.condition_start_datetime

All other _datetime fields are consistent with the documentation. It appears that when the associated _date field was marked as required, the _datetime field was created as required in the DDL.

Thanks,
Jeff

We had exactly the same issue with a recent ETL. We ended up populating the _datetime fields by inserting the date as a timestamp. This is not ideal, as this just duplicates data. Very interested to hear the reasoning behind making this field required.

I can comment on the intent of _datetime fields, but not the current state. The fields were intended to start as optional to allow an orderly migration from _date fields to _datetime; that is, current implementations would not have to do anything but add the empty columns. With a future major CDM revision, we would actually remove the _date fields, make the _datetime fields required, and change all the associated software. Whether there needs to be an intermediate version with required _date and required _datetime is unclear.

Cool. So I’m not the only one and there is now documented history with respect to the columns. :smile:
Thanks @hripcsa for the history and intent. If the intent is to ultimately remove the _date fields it would mean that if the _date is required then the _datetime is required. The question now becomes what to do when you only have a date, or a date and time with no timestamp.

What would be helpful is to provide this level of intent in the description for the fields or in the conventions section below the table. I think there is already discussion in places about the defaults to use if you only have a year, or a year with a month. I was surprised to see that it was the last day of the month instead of the first day of the month. I think it would be use specific and could be different for start versus end dates. For times, Oracle will always default to “00:00:00”. We should never rely on defaulting by the underlying database because it may be different from one implementation to another. I personally will probably default according to use in my implementation. If no time on a start date then I will default to “00:00:00”. For an end date it will be “23:59:59”. That alone will meet at least 80% of the scenarios.

I would love to have someone’s opinion on what to do when the end_datetime is required and there is no end date or time? Prime example is Visit_occurrence.visit_end_date (soon to be visit_end_datetime), when the patient has not yet been discharged. Do you just exclude all active encounters or do you default the date to a dummy future date as is done in the Concept table for active concepts? For concepts the valid_end_date is set to the date “2099/12/31 00:00:00”. Well, it may just be 20991231 and Oracle added the “00:00:00”.

Thanks again,
Jeff

We are working on it in THEMIS.

That’s exactly the idea. Come on in and help! :smile:

We also have on the table a proposal for a granularity field for time. But need to better understand the processing implications. Other choices are beginning, end, use-specific, and random (doubt this one, but some nice statistical properties).

Hi @Christian_Reich
I have a question on populating _datetime fields. If in the source data there is no information about the time of birth or time of clinical events, can we populate _datetime fields with the datetime as it appears in the source data or we should leave it empty?

@TBanokina

So, there is no information, or it “appears in the data”?

@Christian_Reich
There is no time information. So, can we populate _datetime fields just by duplicating date info?
For example: date of birth 2015-01-01 and we will populate birth_datetime 2015-01-01:00:00:00

@TBanokina

Yes.

There is a THEMIS item on this, whether it should be 0:00, 0:01, 12:00 or 23:59. But do 0:00 for now.

Thank you @Christian_Reich

For those who are interested, the thread on what to do with DATETIME was posted here. Please @TBanokina or @jsjacobs feel free to comment!

t