How to store events with empty dates

Dymshyts · July 18, 2019, 1:46pm

We noticed empty dates in source diagnosis table.
I assume, that the usual solution for this case is just not to put these events into CDM table.

On the other hand we want to keep as much source data as possible.
So there’s an idea to represent these concepts as
OBSERVATION.observation_concept_id = (‘History of clinical finding’ or ‘History of procedure’ depending on domain the source concept is mapped to)
OBSERVATION.value_as_concept_id = (concept the source concept is mapped to)
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00

I like this solution.
But there’s a probabilty that the event was after OBSERVATION_PERIOD end though.

So, friends, I want to hear from you.
tagging
@ericaVoss @nzvyagina @Christian_Reich @MPhilofsky @Maria_ya @DTorok @aostropolets

Alexandra_Orlova · July 18, 2019, 3:10pm

I’m not in the tagging list, but I would like to share my experience

If the diagnosis has a link to visit, then you may try to obtain an approximate date of diagnosis (start date of visit)
If the diagnosis has a link to another medical event (e.g. some drugs were prescribed based on this diagnosis), you may take a date of this event as an approximate date of diagnosis
Maybe you have some flags in the diagnosis table (e.g. “ongoing”, “stoped”, etc.)?
Is it possible to discuss imputation rules with data owners? Maybe they have useful info.

MPhilofsky · July 18, 2019, 5:26pm

@Dymshyts,

Is there any date associated with the data? Maybe not a diagnosis date, but a date the diagnosis was recorded? If there’s a date the diagnosis was recorded route it to the Observation table and use the date the diagnosis was recorded. Then use the appropriate concept_id as you state:

If there isn’t any date, route it to the Observation table as per above and then use the dataset extract date as the observation_date. Don’t use

because you don’t know if the condition was present at the start of the observation period. But you do know (as much as one can know with incomplete data) the condition was present as of the data extract date.

Does that make sense? I rewrote it a couple times to try to be as clear as possible

Christian_Reich · July 18, 2019, 7:49pm

So, whatever you guys do (these things are all the prerogative of the ETLer) don’t forget that “not losing any data” needs to be balanced against data quality. If we are not confident about the imputed date - toss that thing. I wouldn’t take indicators as imprecise as Observation Periods. Because the timing is important for all causal inference use cases. The thing that causes something comes first. If we do wild date inventions we will screw this up.

Generally: Dropping data is not as bad as many folks think it is. Bad data can be worse than missing data.

Dymshyts · July 19, 2019, 11:09am

Thanks for your replies.

@Alexandra_Orlova unfortunately there are no connections to any other medical events where I can retrieve the date from.

@MPhilofsky

Oh, I made a typo. I wanted to say:
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00 ( I change it in the original post)
So, my assumption is that this happened before the end of the OBSERVATION_PERIOD. What do you think of this?

@Christian_Reich, I’ve got you point. Let’s hear when I changed to OBSERVATION_PERIOD.observation_period_END_date

Again, sorry for the confusion

Dymshyts · July 24, 2019, 4:12pm

Actually it happened to be an interesting investigation:
As @MPhilofsky said “History of something” concepts can be used as an exclusion criteria, like no other cancers before the cancer of interest event.
Then I excluded patients having same or similar diagnosis with an actual date. I search for diagnosis with the same first 3 symbols in a source code (as it’s ICD10CM and ICD9CM first 3 symbols indicates the organ with cancer presence).
And still there are diagnoses like
C79.31 Secondary malignant neoplasm of brain
C34.90 Malignant neoplasm of unspecified part of unspecified bronchus or lung
and other cancer diagnoses.
So the decision was made:
OBSERVATION.observation_concept_id = (‘History of clinical finding’ or ‘History of procedure’ depending on domain the source concept is mapped to)
OBSERVATION.value_as_concept_id = (concept the source concept is mapped to)
OBSERVATION.observation_datetime = OBSERVATION_PERIOD.observation_period_END_date + time=00:00:00

Should we make this somehow an official way of dealing with diagnoses with empty dates?

jliddil1 · July 24, 2019, 4:21pm

We have an ETL rule that sends the data back for review if blank. Then if there is no date the expectation is they enter unknown. If ( big if) data are clean coming in then these issues go away. In the Pharma world this is solved using EDC and having queries fire on the front end that prevent this to a large degree.

opatterson · April 21, 2023, 5:06pm

“History of clinical finding in subject” concept ( Athena ) is not standard and I was not able to find an equivalent standard concept. Do you have some other suggestions on a standard concept that is more appropriate?

Dymshyts · April 25, 2023, 2:46pm

Hi @opatterson
The guidelines of how to store the History changed.
Please see here
We continued (after the v20220510 release) a construction of axis of historical concepts . The “History of clinical finding in subject”, “Past history of procedure”, “History of” and many other SNOMED concepts (including their descendants) were deStandardised…