OHDSI Home | Forums | Wiki | Github

History of condition with age

(Maxim Moinat) #1

In the UKB working group today, we discussed how to represent ta history of heart attack together with the age at which this was diagnosed. An example of the input data:

We came with the following four options:

Please comment which option are definitely incorrect, which are okay and whether there are other options. e.g. during the meeting someone suggested using the operator_concept_id, but I am not sure how. Tagging @Christian_Reich @Alexandra_Orlova @Alexdavv @Dave.Barman

Screenshots are from this sheet: https://docs.google.com/spreadsheets/d/13VItlyhMJpbClhvnkFWWmPYOz8TeCdhZxV-4U4ctlBk/edit#gid=0

Who is working with UK Biobank?
(Vojtech Huser) #2

For the sake of argument, let’s assume that I ask about history of giving live birth.

Mother remembers very well exact birthday of their children. (and even hour)
I like to trust the patient sometimes. Claims data and EHR can be wrong at times too.

(Seng Chan You) #3

In terms of analytic process for OHDSI study, my personal preference would be option 1, populating condition concept ID and date by using YOB and Age.

(Christian Reich) #4


None are correct, actually.

  1. 2015-1-1 is not a date. It is a crude estimation with an error of a year. We need a date. Some people even want datetime. Because we will use it to distinguish cause from effect, or intervention from outcome.
  2. Age is not a value. It’s a derived variable from today() and DOB, and it changes. Plus, whatever is observed should be the thing observed that day, which hopefully not a heart attack whhle filling out the survey.
  3. Is the closest to legitimate OMOP, but again, age is not a number, and nobody and no ATLAS will have the slightest clue that the “65” value means age. We could have a convention like that, but we don’t today.
  4. Same as 3., plus there is no such a thing as value_as_datetime, and it wouldn’t help with an imprecise date.

the problem is this: What do we do with a fact for which we don’t know the exact timing? Because, as discussed above, observational research needs both the fact and the timing. It’s really useless if we don’t have both, except in those situations where we want it as an additional condition or criterion. Since we cannot look into the future these are then facts in the past, distant enough that we wouldn’t want to put a trigger (index date) onto them, but important enough to not throw them away.

How should we do that?

  • Using the history of mechanism. Only problem is we don’t have a date at all, which then really means “sometime before any of the properly timed events started being recorded”. People find this too vague, they want this “2015” or “early 1980ies”. I am not sure. The other problem is that ATLAS doesn’t really bend over backwards to help folks utilize this.
  • If we allowed the data to be imprecise. Like we do in DOB, where we cut it into day, month and year, and only the year is mandatory. It’s computationally very expensive if we did that for all dates, and we never use the DOB as an index date.
  • With some extra field which keeps the proper date or datetime field clean.

Is the history of thing really not sufficient?

(Oleg Zhuk) #5

And one more related theme from that call with a long-perspective idea: MAPPING table (in vocabulary) (problems with relationship)

We also had some ideas with ‘this infarction start_date is at least less than date_x

So what if:

observation_date - calculated as YOB + Age + 1 year
observation_concept_id - 4214956 (History of clinical finding in subject)
value_as_concept_id - 4329847 (Myocardial infarction)

I believe that was @Alexdavv proposal, seems legit to me.

(Christian Reich) #6

Correct, we discussed this. Problem is, what’s this X? You make it one year. In another discussion we discussed 10 years. What is the threshold for data to be far back enough from the time of observation that they become “history”? Depends on the question. If you are looking for carcinogenic events it’s at least a year. In Covid, everything before March is already history. Now what?

Therefore, “History of” is save. Nobody gets to be tempted to use this artificial cobbled together date as the real thing. That’s where we want it.

(Maxim Moinat) #7

The value_as_datetime field was added in OMOP CDM v6.0. And I would prefer this option actually. We will never use value_as_datetime as index date, so it does not have to be precise. And we can use the convention that if only year is known, we set it to year-01-01.

(Chris Knoll) #8

Please consider the current behavior of our tools: you’re now introducing another date field which you may be using within a time window which leads to confusion from the perspective of the tools: if you say ‘had X between 365 days before and 0 days before index’, you now need to know that X refers to value_as_datetime instead of the actual date record of the observation (that all the tools use for the ‘date of the observation’). This doesn’t just impact cohort definitions, if you create a cohort feature (for characterization or patient level prediction) you run into the same problem where the record has some date but then there’s another ‘embedded date’ in the record that you should really use for the actual time of the event.

I would like to propose that we think of all observational data as ‘history of’ since anything that happened before a given date is considered ‘history of’ in the patient record. If you don’t have an actual date of the fact, then you don’t really have the fact. If I said ‘Yes, I received this treatment in my teens’, you can’t really use this information in any meaningful way. I’d suggest making your best guess as to an actual date something occurred, and possibly use condition_type_concept to specify that this record has less confidence than other information by actual clinical capture (a physician diagnosis, for example). Then you can decide if you want to use this type of fact in your analysis.

(Maxim Moinat) #9

That the tools do not use value_as_datetime for e.g. index events is actually the reason to use it. Following the argumentation from Christian, the given dates are approximations which we should not use in cohort definitions. By using the value_as_datetime field, we can store the information without causing issues with the current tooling.

This particular data source (UK BioBank) is a survey, so this means we have to throw away half of the data. We want to capture these histories. Using a type_concept to indicate that we have low confidence in the date is a good idea, provided that the analytical tooling will handle that correctly (or at least make OHDSI researchers very aware of the different types).

(Chris Knoll) #10

I’ve taken some time to consider your points. One thing I’d like to clarify about my position on observational data is that the dates of the facts aren’t always when something actually happened, but rather it’s the date that we were informed of the fact. Most of the time, these dates are very close: the date of a measurement is probably the same date that we were informed of this fact; the date of a diagnosis is usually the same that the system is informed.

Consider the case where someone comes to a doctor after suffering for a week with some sort of painful swelling. The doctor concludes that there’s an infection. Does the doctor put down that the infection started a week ago? The date of the diagnosis of ‘infection’ will be the date the doctor found out about it, not back-filled to some prior date.

But when we get to survey and ‘history of’ type of facts, I believe this principle holds: when a survey is collected, you know the date the information was collected (ie: made known to the system) but the information within the survey could relate to “current state of patient” or “something that occurred int he past”. If it’s current state, then it is an observation on the date of the survey. If it is an observation about the past, then it is a statement about history, but as of the date of the survey. I don’t think we should throw this away, but I also don’t think we should inject ‘facts’ into history since clinical actions taken on a patient didn’t actually have that information at the time.

So, getting to your example of survey data: imagine a patient timeline with clinical observations, and in the middle of that timeline the patient answers the survey. Using any index date before the survey would lead you to some clinical conclusions/statements about the characteristics of the patient that would be very different after the survey was performed at an index after the survey. So, don’t throw it away, just treat the information as if it was known as of the survey date, using direct concepts as observations if it is current state, and wrap it into a ‘history_of + value_as_concept’ for those historical cases. in this way, we don’t actually need a date of when the actual thing happened in history.

But, if ‘scope of history’ is important (as in: recent history, distant history), I think it would make sense to introduce a hierarchy of History Of concepts that allow you to capture the different history contexts, something like:

            -> within 5y-> within 1y-> Within 6m -> within 3m -> within 1m
History Of -|
            -> after 1m -> after 3m  -> after 6mo -> after 1y -> after 5y

The idea of this hierarchy is to allow you to say things like ‘anything within 5y’ and things coded to 1y, 6m, 3m or 1m are included in the descendants. Same thing with after 1m…If you don’t care, just pick all descendants of ‘history of’.

I’d translate things like ‘at age 65’ to one of these buckets, otherwise you need to do a multi-step translation of what the age was relative to the date in order to covert that date into something usable for comparison.

And, what about ‘family history’ cases? I think you’d solve this the same way: the date of the observation is when it was known, not ‘the beginning of time’ as some people suggest. Clinical decision making is based on the family history based on when this information is known, not retroactively back to the beginning of their patient record.

I think I’ve seen other proposals where ‘history of’ observations are placed outside of observation periods or at the start of observation periods, but I really don’t think this makes sense: If you have a history of smoking, do you really put that record back at your year of birth (if your observation starts at birth)? pretty sure they do not allow smoking infants in the OR. Instead, all of these facts could just be captured at the day they are known in an observation record, and clinical analysis can be performed based on what is known at the time of the index, not what we ‘think’ we might have known based on some date-offset from a survey response (or patient reported history).

(Maxim Moinat) #11

Thanks Chris for this elaborate explanation. I personally like the idea of having a hierarchy of history concepts. We will discuss this further in the working group.

(Maxim Moinat) #12

@Chris_Knoll We discussed your proposal in the UKB working group and overall it was received well. It is a smart way to allow capturing the uncertainty in the timing of the event.

One open item is how to choose between ‘within’ and 'after. Lets say the data captures that some event occurred 3 years ago. Would this map to within 5y or after 1y? What makes most sense from an analytical point of view?

Do we want to make combinations of both after and within? In this example after 1y and within 5y. I can imagine this then would be a child of both the individual concepts.

(Chris Knoll) #13


That’s a good question. My hiearchy idea was specifically for uncertain facts when all you say about the fact is that ‘well it was within 5 years’ or, ‘it was longer than a year ago’. The case you’re talking about is that you have some relatively concrete information so maybe that’s simply a history_of + value_as_concept + number_as_value (in days)…

When I think of the concepts and concept hierarchy, I’m thinking categorical data, so I would say you just put the historical fact into a category and the problem I am seeing in your example is that the value in question could fall into 2 categories: within 5y, later than 1y and my first reaction is to prefer ‘within’ time periods and not ‘after’…Or perhas the solution is to drop 2 observation records one for each bonding window of hiearchy, so you can say if you want to find things between 1 and 5 you say 'must find an observation within 5 and must find an observation of after 1 yaer.

I totally understand the attraction of having those derived ‘range of history’ since it does seem to make sense…

Maybe the alternative is to have a dedicated ‘history’ table that acts like observation, but it has min_days and max_days that let you specify a range of where this event occurred relative to the ‘fact date’.

(Christian Reich) #14

The problem is with the range that we will have to create a lot of permutations. I think the “longer than ago” is totally sufficient from a clinical perspective.

(Alexander Davydov) #15

So you suggest only one type of enriched ‘History of…’ concepts?
Most of the surveys sound like ‘Have you… within last X weeks/months?’. And ‘longer than ago’ doesn’t work here.

I did a review of the questionnaires among the different vocabularies and noticed that in terms of data collection 1, 2, 3 weeks as well as 2, 3 months are also usually used. They’re likely done according to the clinical criteria so I’d not reduce the number of options since we may need to reproduce it in cohorts.

Withing each ‘History of…’ concept to be enriched (I’ve identified only 4) two independent hierarchical branches can be created:

Also we don’t enrich No history of procedure, No family history of, No history of clinical finding in subject, right?

Sounds really good and doable in cohort building.

(Chris Knoll) #16

only thing I’d adjust with the above list is that i’d put the time contexts as children under ‘history of clinical finding’, but the family history was going to be it’s own branch directly under ‘‘history of’ (or maybe a parent of its own)…because 1) wouldn’t that lead to a lot of time-window’ children (one 20 children for history, 20 children for faimly history), and 2) does time context matter from a family history perspective (ie: the family member is older than the person, so storing that you have a family history of something that happened within 1 year ago, the timing doesn’t mean as much when the family member is 40 years older than you).

So, the hierarchy i was thinking was along the lines of:

              | -> Family History
History Of -> |
              | -> Personal History |
                                    | - Within ....
                                    | - longer than....

You want any history, use descendants of History Of…if you want family history, use descendants of Family History…you want any personal history, use descendants of personal history, if you want within 3 months personal history, use the descendants of within 3 months.

This is just a suggestion for simplification. If these surveys or if medical history is captured about the timing of the event for the family member, then I guess you have to do it, but unless there’s a specific need, I’d just keep it simple.

(Christian Reich) #17

@Chris_Knoll: We can probably simplify it. The family history is distinct from the history, not a child. And there is no timing in the family history. Family history automatically means that some ancestor (or uncle/aunt) had it, and for obvious reasons it is a lot of but unspecified time before now.