How to store observations that do not need a start date?

mccullen_j · March 14, 2023, 4:23pm

I’m somewhat new to the OHDSI community and am working on performing an ETL using fake data I generated from Synthea. There is an SSN field in the patient table and I found out that concept ID would be 4162224 under the Observation domain. However, a start date is required. What would you put for that? The birthdate? You could have other patient attributes like marital status or religion which do not necessarily have start dates that you can identify. What do you do in these cases?

Thanks for your help.

Mark · March 14, 2023, 7:24pm

In theory, one should find this out from the patient.
Practically(chose 1):

the first encounter of said patient.
the closest encounter, of said patient, to the creation of the observation timestamp
a magic number that is outside the range of data. Commonly, this is done as ‘1900-01-01’ for start dates, as this is a SQL Server standard even though, when this was adopted, this had interference with real dates. ‘2999-12-31’ is what I use as a default end date. Many are using ‘2099-12-31’ as a default end date, but we are setting up a new ‘y2k’ problem by using a date so close.

As long as you are consistent, everything should be fine. I prefer the magic number myself as SQL can be written to know it is invalid.

fabkury · March 14, 2023, 8:09pm

Other dates that can be considered:

dataset start date (earliest date possible across the entire dataset)
start date of each person’s first observation period

I am unaware of any “standard” way to make that choice, so I agree with Mark – as long as you’re consistent, it should be fine.

Mark · March 14, 2023, 8:24pm

This is a more precise way of thinking about my use of encounter, as an encounter may not have an observation, therefore not being in the OMOP data set.

Christian_Reich · March 15, 2023, 9:02am

@mccullen_j:

Two questions:

Why do you need the SSN in OMOP data? We are not treating individual patients, but derive statistical insights from populations? I would not put that in.
You are right that there are attributes (observations) of patients that are time independent. Like germline genomic variants. For those, the date is irrelevant. Want to record the date the observation was made?

mccullen_j · March 15, 2023, 1:33pm

Thanks for all the helpful responses.

mccullen_j · March 15, 2023, 1:34pm

My question was about time independent variables generally, not SSNs in particular. I just thought about it when I was doing the mapping on that for my fake data.

That said, is standardizing your data just for population-level insights? Suppose you want to lookup a particular patient and you do not know their id but you know their ssn.

Christian_Reich · March 15, 2023, 4:13pm

Got it. So, the observation is time stamped when the observation is made. In those cases that date is irrelevant. Doesn’t hurt, though.

Why do you want to lookup a particular specific patient for clinical research? Can’t you use another but meaningless identifier?

mccullen_j · March 15, 2023, 4:15pm

We are in the very early stages of standardizing our data and mostly learning and discussing things at the moment.

I’m not sure how useful it would be for research, just for finding information about a particular patient. I could also see it being useful for data exchange. If you are getting data from another OMOP source, how would you check to see if they are in your system without some sort of global unique identifier like an ssn?

Mark · March 15, 2023, 4:44pm

You can always build a lookup table, outside of OMOP, to do exchange data.

As for SSN, patients are not required, nor can they be, to provide it. Do a search on your data, I will wager that you will find many invalid identifiers. Speak to your exchange to see what identifiers you should be tracking.

To be clear, I am not providing legal advice, I am repeating standards that I have been taught.

Christian_Reich · March 16, 2023, 3:51pm

What @Mark said.

Are you in the business of having to combine several datasources together? If so, you either have a common identifier (which goes into the person_id or the person_source_value of the PERSON table). If you don’t have that, and you need to match patients based on identifiable information (name, address, telephone number, SSN and the like) @Mark is right. It’s called “database linking”, and is usually done outside the OMOP CDM, and often even outside your institution, to protect the anonymity of the data.

I would leave the OBSERVATION table alone for this. It is for clinical observations of a patient, not for identification. I cannot think of any clinical research use case that would need SSNs of patients.