OHDSI Home | Forums | Wiki | Github

What to do with demographic data without date information?

We have demographic data i.e., marital status, primary language and religion. But they are in patient table which do not have a date associated with the information… My question is: do we leave them out of the CDM since there is no date information?

The [person table][1] represents each patient as a single row and does not support temporal data.

Each person record has associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation. For example, the location or gender is expected to have a unique value per person, even though in life these data may change over time.

If you have longitudinal demographic data, it’s up to the institution to determine the best fit for the ‘constant’ value. Most commonly occurring and most recent value are popular choices.
[1]: https://github.com/OHDSI/CommonDataModel/wiki/PERSON

For data points that don’t fit neatly into the person table (e.g. religion, primary language), couldn’t you just map them to the appropriate table (observation, I’m guessing), and put the date of registration (or the date of first encounter) as the pertinent date?

@esholle I thought about the same thing but wondering if it complies with OMOP rules. And yes, if they have date associated with them, they should go to Observation table. There are actually concepts on such demographic information which belong to Observation domain.

4053609 (Marital status)

4267143 (Language)

I would define the date as ( OBSERVATION_PERIOD.observation_period_start_date + OBSERVATION_PERIOD.observation_period_end_date) /2 – in the middle of the observation period

And there’s a chance that patient changed their marital status, but we can’t know this anyway

Thank you, @rtmill, @esholle, and @Dymshyts. Your input are very helpful. If I summarize it, it seems:

@esholle proposes to use the OBSERVATION_PERIOD.observation_period_start_date
@Dymshyts proposes to use the ( OBSERVATION_PERIOD.observation_period_start_date + OBSERVATION_PERIOD.observation_period_end_date) /2

But I think OBSERVATION_PERIOD.observation_period_end_date is more reasonable because I think the source database updates the demographic information every time the patient visits, and the data reflects the status of the patient during their last encounter.

I am not sure if other developers have had similar issue and how they dealt with it. Also do we want to put it as a Themis subject?

Friends:

If the information is more or less static (like primary language) the date really doesn’t matter. If it is dynamic (like marital status for some of us), it should be recorded the time the observation was made. How to determine that depends on the source data. I don’t think either have anything to do with the Observation Period.

t