Map to non-standard concept_id or to 0?


This is a very basic question, but I’m afraid I haven’t been able to find a clear answer.

When source data cannot be mapped to a standard OMOP concept_id, is it preferable to map it to a non-standard concept_id (if existing) or to 0?

For example, if my source data for sex, besides “sex is male” and “sex is female”, contains values of “sex unknown”, would it be preferable to map these to one of the non-standard concept-ids, like “Gender unknown” or “UNKNOWN” or to map them to 0?

What would the implications and advantages/disadvantages be of either approach?

Bert Overduin


Short answer: 0.

Longer answer: These are all so-called “flavors of null”. Which means, we don’t have the information. Whether it wasn’t asked, or it is “unspecified”, or “unknown”, or “ambiguous”, or “other” is irrelevant to the use cases. In OMOP, all of these are encoded with concept_id=0.

But in your list there are also transgender, female-to-male, male-to-female and non-binary, which you may want to map and record in the OBSERVATION table.

