OHDSI Home | Forums | Wiki | Github

How to store events with empty dates


(Dmytry Dymshyts) #1

We noticed empty dates in source diagnosis table.
I assume, that the usual solution for this case is just not to put these events into CDM table.

On the other hand we want to keep as much source data as possible.
So there’s an idea to represent these concepts as
OBSERVATION.observation_concept_id = (‘History of clinical finding’ or ‘History of procedure’ depending on domain the source concept is mapped to)
OBSERVATION.value_as_concept_id = (concept the source concept is mapped to)
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00

I like this solution.
But there’s a probabilty that the event was after OBSERVATION_PERIOD end though.

So, friends, I want to hear from you.
tagging
@ericaVoss @nzvyagina @Christian_Reich @MPhilofsky @Maria_ya @DTorok @aostropolets


(Alexandra Orlova) #2

I’m not in the tagging list, but I would like to share my experience :slight_smile:

  • If the diagnosis has a link to visit, then you may try to obtain an approximate date of diagnosis (start date of visit)
  • If the diagnosis has a link to another medical event (e.g. some drugs were prescribed based on this diagnosis), you may take a date of this event as an approximate date of diagnosis
  • Maybe you have some flags in the diagnosis table (e.g. “ongoing”, “stoped”, etc.)?
  • Is it possible to discuss imputation rules with data owners? Maybe they have useful info.

(Melanie Philofsky) #3

@Dymshyts,

Is there any date associated with the data? Maybe not a diagnosis date, but a date the diagnosis was recorded? If there’s a date the diagnosis was recorded route it to the Observation table and use the date the diagnosis was recorded. Then use the appropriate concept_id as you state:

If there isn’t any date, route it to the Observation table as per above and then use the dataset extract date as the observation_date. Don’t use

because you don’t know if the condition was present at the start of the observation period. But you do know (as much as one can know with incomplete data) the condition was present as of the data extract date.

Does that make sense? I rewrote it a couple times to try to be as clear as possible :slight_smile:


(Christian Reich) #4

So, whatever you guys do (these things are all the prerogative of the ETLer) don’t forget that “not losing any data” needs to be balanced against data quality. If we are not confident about the imputed date - toss that thing. I wouldn’t take indicators as imprecise as Observation Periods. Because the timing is important for all causal inference use cases. The thing that causes something comes first. If we do wild date inventions we will screw this up.

Generally: Dropping data is not as bad as many folks think it is. Bad data can be worse than missing data.


(Dmytry Dymshyts) #5

Thanks for your replies.

@Alexandra_Orlova unfortunately there are no connections to any other medical events where I can retrieve the date from.

@MPhilofsky

Oh, I made a typo. I wanted to say:
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00 ( I change it in the original post)
So, my assumption is that this happened before the end of the OBSERVATION_PERIOD. What do you think of this?

@Christian_Reich, I’ve got you point. Let’s hear when I changed to OBSERVATION_PERIOD.observation_period_END_date

Again, sorry for the confusion


(Dmytry Dymshyts) #6

Actually it happened to be an interesting investigation:
As @MPhilofsky said “History of something” concepts can be used as an exclusion criteria, like no other cancers before the cancer of interest event.
Then I excluded patients having same or similar diagnosis with an actual date. I search for diagnosis with the same first 3 symbols in a source code (as it’s ICD10CM and ICD9CM first 3 symbols indicates the organ with cancer presence).
And still there are diagnoses like
C79.31 Secondary malignant neoplasm of brain
C34.90 Malignant neoplasm of unspecified part of unspecified bronchus or lung
and other cancer diagnoses.
So the decision was made:
OBSERVATION.observation_concept_id = (‘History of clinical finding’ or ‘History of procedure’ depending on domain the source concept is mapped to)
OBSERVATION.value_as_concept_id = (concept the source concept is mapped to)
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00

Should we make this somehow an official way of dealing with diagnoses with empty dates?


(JD Liddil) #7

We have an ETL rule that sends the data back for review if blank. Then if there is no date the expectation is they enter unknown. If ( big if) data are clean coming in then these issues go away. In the Pharma world this is solved using EDC and having queries fire on the front end that prevent this to a large degree.


t