The term “gender” is used in the CDM as a synonym for sex assigned at birth. This is confusing that the word gender was used and continues to be used, as gender is a social construct. Moreover, a more useful meaning is genetic sex, where by the value “intrasex” should also be valid. This proposal proposes a 4-phase change to migrate to “genetic sex”.
In the 5.X series, sex_concept_id would be added, along with a Sex domain with values (male, female, intrasex).
Applications could start using “Sex” for screens, making sure that Intrasex is a valid option besides Male and Female.
In the 6.X series, gender_concept_id and related columns could be removed.
Gender can then become an observation with possible values that more closely match social constructs.
This is a long-standing problem with the CDM. It’s time it gets addressed.
Update: Perhaps it may be better to just add a sex column, keeping gender column on the person table as well. This would let those doing ETL do the best they can to populate data from relevant sources, and those doing queries to choose the attribute that best fits the analysis.
Gender has been a social construct since antiquity. You can find details of cross-dressing, gender ambiguous deities, and other aspects well before modern western civilization. In medicine, genetic sex has been clearly differentiated from gender for several decades now. The word gender is often used as a synonym because people are simply uncomfortable with the word sex.
The lovely thing about this proposal is that gender is then free to be used in observations. It can change over a person’s life, for example. Critically, it’s no longer a field on the PERSON table which one may hope is less subject to change.
Presently, “gender assigned at birth” lacks ways to indicate someone was Intrasex. This would be a defect in the current model we could address. Intrasex people are neither Male nor Female. It is also not a social construct, it’s due to genetic differences.
This appears to me to be an ideological argument. How can we maintain standards of subjective truths? It seems to me that the “Observational” part of OMOP demands objective truths to be useful at all.
Well, if you only want to put it in the Observation table, why not create a Gender_Expression domain and store that as an observation? That way, the whole CDM doesn’t have to be modified with breaking changes that will affect everyone’s models, disrupt historic continuity, and every OMOP application from Atlas to Usagi.
It’s a lot of effort for very little lift.
The same thing was said around 10 years ago when I noted the issue. At the time, I was working on the Simons Foundation Autism Research Initiative (SFARI) and brought up the issue in this community. This is a fundamental modeling problem. Yes, it would need a migration to be planned, but, it can be done gradually, for several years ETLs can keep 2 fields synchronized without much effort.
If you think OHDSI is large now, just wait. It’ll be much larger in 5 more years. This problem will be much harder to address in the future.
It’s not about things being offensive. It’s about being incorrect. Why should it be an error that a Male gendered patient has a papsmear? It’s not an error, in some clinics it’s a common phenomena. There are also drug studies that may be incorrect, or genetic analysis that should be excluding intrasex patients that don’t. It’s a health equity issue.
Don’t think of it as a burden, think of it as an opportunity. Surely there is health equity grant opportunities to make OHDSI more friendly towards those doing genetic analysis and care of patients whose gender is not the same as their biological sex. Such a grant could clean up more than just this problem and make OHDSI more welcoming and useful for a broader range of applications.
I am already having the issue of changes happening faster than I can code around them. Remember, us small to medium sized institutions, we do not have the resources to chase every small change everyone wants. On top of that, there is already too much ‘judgement’ that has to be used in the ETL process; I wonder how useful much of the data in the CDM really is anyway.
We probably are not that terribly far away from each other:
Gender is social, sex is biological at birth. Clark is right in that the current table name in the PERSON table is wrong. Roger is right that this wasn’t always so tightly defined. Certainly not when the OMOP CDM was initiated. Back then, you would use the terms synonymously, and you would avoid the word “sex” because of its connotation.
From a use case perspective, everything is fine as it is: If you ignore the wrong name, what’s in the PERSON table (gender_concept_id) is the biological sex, and the real gender is in OBSERVATION. We can run any study we want including these two.
The solution is to rename gender_concept_id to sex_concept_id. Very simple. Problem is that this is a gigantic, breaking, non-backwards compatible change. I would say every single method, study package or software tool would have to be changed. So, that is definitely a major release. We could smoothen it by having both fields for a transition period. But that is also ugly.
So, the question is when do we want to do that? Roger says “when the cow comes home”, Cark says “asap, it is already embarrassing”.
It cannot be biological sex since doesn’t have an option for intrasex subjects. Instead, it’s closest to “gender at birth”, which is the most political inconsistent of all 3 options.
That’s a vocabulary problem and easy to fix. Plus it is a data problem: My hunch is that those cases are not well captured. Remember, for the vast majority of patients this gets entered by the front desk staff of an office or a registration office of a hospital. The gender information is probably more useful for those cases. And in international databases we may be even further away from the correct detail.
…or from a data transferal from one EHR to another. Also, many patients demographics may be from old paper charts if the patient has been in the practice longer than the EHR has.
Hmm. Perhaps the Gender column is named correctly with the best available data. Perhaps the problem is that we are trying to treat it as biological sex?
Perhaps we shouldn’t remove the Gender field from the Person table, but simply add the Sex column. If both fields are there, it means that those doing ETLs would not be able to ignore the differentiation, and could search for the best available data. Having both fields would mean those doing queries could also be more precise about what exactly they mean: gender or biological sex.
That’s one choice, but it probably doesn’t help us much:
Sex is static. It is established at birth, and you get born only once. So, it should be in PERSON. The field we have (gender_concept_id) is actually used as if it were sex_concept_id. It just has the wrong name.
Gender is dynamic. You can change it. Which means it cannot be in PERSON, but in OBSERVATION with a time stamp. All we need to do is to make sure we have a agreed gender concept convention.
The naming problem can be addressed as soon as we go to the next major version. There are other things we also want to change. If folks are eager - start rolling the drum.
In the mean time: We can add explanatory and apologetic language to the PERSON table documentation, explaining that gender_concept_id is wrongly named and really should be sex_concept_id, but otherwise it is all working.
Again, we don’t have a problem with the content. We have a problem with naming. Functionally, a variable or table field name is a memory address pointer for the processor. It means nothing. So, if we can live with having to apologize while we are working on a new version we are just fine.
Thanks for this update. In fact, gender identity now has much clearer coding options and sex is still considered to be biological sex by journals, by the NIH, and others). I’d cast a vote for sex (adding intersex as a code) and gender, although gender could be an observation, as we code it now at our site. Gender identity can be fluid in a small number of people.
As an ETL’er this is not always possible, as both fields may be in the demographics section, which does not have timestamps attached to them, at least not in the EHR that we use.
I see no way to change this that does not cause pain to someone, somewhere.