OHDSI Home | Forums | Wiki | Github

Should transgenders be excluded from the CDM

(Thomas Moyer) #1

The Johnson & Johnson Common Data Model (CDM v5.0) ETL Mapping Specification for Truven (CCAE and MDCR version 13 dated April 19 2016
suggests in section 2.2 Table Name: Person ,
"Delete the following members: Gender changed over different enrollment period "

My view is that cohorts of transgenders is a legitimate segmentation for statistical studies and other analytics.
Questions can be asked and answered only if the data is available. To censor all data on transgender persons is wrong.

Has the the OHDSI community adopted or repudiated the systematic exclusion of medical information on transgender individuals from the Common Data Model?

HL/7 AdministrativeGender missing
(Chris Knoll) #2

Hi, @TPMoyer, glad you’re looking into this.

I think the purpose of that ETL logic is to remove patient level records that could be incorrectly recorded. The rule you cited above is not the only exclusion rule; I believe patients with missing birth years are removed as well.

It may also be the PERSON table only supports a single gender value, so how would you code the gender?


(Thomas Moyer) #3

Like address, and Date-of-Birth-Year, use the last known record.

(Christian Reich) #4


But that doesn’t help you with your transgenders. It leaves you with your last reported gender.

In fact, if you really want to study this use case, you will definitely need both, because the potential effect you are studying could be due to the genetic makeup or due to the phenotypic constellation (hormone replacement etc.).

(Chris Knoll) #5

I was calling out the case of missing date of births. Can’t use the last known record in that case, and so I believe those patients are dropped.

(Thomas Moyer) #6

No suggestion is made that retention of persons with multiple observation_period genders would allow inclusion/exclusion from cohorts based on the person table’s single entry per per person. One would, of course, need to refer to procedures received or drug regimens undertaken for cohort selection criteria.

  The dropping of persons with multiple reported genders in their observation_period history has the effect of removing all transgender person from the person table.   As the person_id in the person table is the primary reference to the other tables, excluding transgenders from the person table is, in effect, removing them from the CDM.

(Christian Reich) #7


Are you worried about biasing the result of a study because dropping these people, or are you worried about not being able to select them into a cohort and study them?

If the former - their numbers is probably an order of magnitude smaller than just mistakes in the records. And therefore small enough not to bias anything.

If the latter - you cannot do that with the “singe entry per person”, as you call it.

Not sure what you are trying to achieve.

(Thomas Moyer) #8

I am concerned that no one could select transgenders into a cohort, if the J&J censoring suggestion is followed.
If and only if one has an entry in the person table, can their medical history can be searched for procedures and drug regimens associated with their medical history…
Just as diabetes and hypertension and stint degradation can not be discerned from the person table, neither can gender change.
Your assertion of “there numbers are probably” can not be quantified if your database follows the J&J censoring procedure.

(George Hripcsak) #9

I would be interested in how many people change genders within an enrollment period. That might be more indicative of the true rate. People who change genders between enrollment periods (the J&J rule) may be mostly error. You may need to go back to the source systems to figure it out, as you won’t have procedures or other narrative evidence logged for the missing periods between enrollment periods. Although presumably a note would mention it historically, but again that is in the source data.


(Chris Knoll) #10

I’ve been able to confirm that transgender people are not excluded from the ETL process for Truven CCAE. I’ve found the following concepts in our data for the following Intrasex Surgery concepts:
http://www.ohdsi.org/web/atlas/#/concept/4179532, female to male
http://www.ohdsi.org/web/atlas/#/concept/4201284, male to female.

Also, I found references in literature that sate 75% of people who identify as the sex that is different from their birth sex do not undergo these surgeries. It is likely that people who do not undergo the surgery will keep their birth sex on their medical record, because some medical procedures require anatomical validation prior to approval.

(Patrick Ryan) #11

To clarify, the Truven datasets do not provide ‘transgender’ status as part
of the dataset. Each monthly enrollment record contains a gender, but on
the occasional instances when the gender ‘switches’ between successive
months, that has been determined by the vendor to be a likely data quality
issue, and not a representation of a true phenomenon.

If any dataset does contain source information about ‘transgender’ status,
I would recommend that information be stored in the CDM and I agree it
could potentially be useful for valid research. However, since the
PERSON.GENDER_CONCEPT_ID field is only one value, whose standard values are
MALE, FEMALE, or can be left unknown (concept_id=0) and is not timestamped,
it may be more appropriate to capture observations about sexual and gender
identity in the OBSERVATION table.

(Thomas Moyer) #12

The J&J spec is to delete persons from the person table, if they have more than a single observation period with a changed gender. This is the behavior I feel should be repudiated.

The J&J spec is internally inconsistant, in that it conflicts with another of their person table key conventions:
• If the person’s demographics change during the period of analysis, the last known record is used.
I feel this is the key convention should be applied to the gender demographic,

  I agree with your recommendation about gender identity appropriately being in the observation table.  One must note, however, that only if one does not follow the J&J spec, will there be a person entry, or any observation table entries for such an individual.

Eliminating Persons [THEMIS WG3 - TOPIC 2]
(Erica Voss) #13

Sorry I didn’t notice this thread earlier!

Thank you for your note on our documentation. @clairblacketer and I will update it to make it more clear. I’m double making sure what our exact rule that is currently implemented - however I do believe if gender changes we just eliminate the person.

Originally our CDM_BUIDLER just took the last demographic value per ENROLID. However then we noticed the changing genders; in our current data load of MDCD it happens to 0.17% of patients, 0.00% for CCAE, and 0.00% for MDCR. We spoke to Truven about 3 year ago about this and they said it is possible for Medicaid IDs to get reused which does affect their data (I stress that this is INFREQUENT issue - probably of the 0.17% patients only a percentage are an issue due to this). Because changing gender was infrequent within the data we received from Truven and we know there is a potential for an issue from the actual Medicaid data we made the decision to just eliminate these patients.

Changing gender happens extremely infrequently in the Truven data and my assumption is that the majority of them are administrative error rather than due to transgender changes. While it is possible some of these gender changes could represent transgender we don’t have confidence one which ones are which.

(Clair Blacketer) #14

After looking through our documentation and test cases here are the rules we use to exclude any persons from the cdm based on gender:

  • If a person has two valid genders listed then they are excluded
  • If a person has exclusively an invalid gender listed they are excluded
  • If a person has an invalid gender listed but in the latest enrollment period they have a valid gender we take the latest gender listed

In terms of finding transgender individuals I am not sure if their health care enrollment information would accurately reflect their change in gender. Could we perhaps do some type of cohort characterization study starting with individuals in the database with a gender identity disorder diagnosis (SNOMED concept_id 4338512) or some other proxy to understand how we can best to define this group in our data?

(Erica Voss) #15

The THEMIS group has thought about this, what about this:

The PERSON.GENDER_CONCEPT_ID should store what is believed to be the biological or sex assigned at birth. If the data set does have gender identification information, this should be stored in the OBSERVATION table (using the gender concepts 8532-Female or 8507-Male in OBSERVATION_CONCEPT_ID).

Add this under conversions under PERSON wiki.

HL/7 AdministrativeGender missing
(Clark C. Evans) #16

OHDSI might wish to consider adding a new column to the model, PERSON.SEX_CONCEPT_ID which refers to the biological sex at birth, which could also be intrasex. In the conversion to a new version of the CDM, one could duplicate the values from gender to sex. The issue is with the word, gender, which is typically used as social construct representing a person’s identification – it would be beneficial if this were a top-level field with corresponding concept breakdown. About 20 years ago, I think this mismatch of terms was done regularly to be polite, because people were squeamish about using the word sex in their data models. However, this is an important thing to fix in the CDM model. For many analysis cases, it’s important to decouple social identification (which is very important for care) and biological. Relegating gender to an observation isn’t all that great either, but if it must be, then in a further release, the GENDER column could be removed after deprecation. I hope this suggestion is welcomed. Naming is hard. This is a tough issue.

(George Hripcsak) #17

Gender_concept_id is OHDSI’s sex column, I am afraid. Real gender is stored in Observation table because it can change over time. We cannot fit multiple genders in the person table. So we should rename gender_concept_id to sex_concept_id it at some point, but we do not need to add a new column. The main cost is changing the code we have written.

(Dee Bowden) #18

I realize that this is an old thread. It is of particular interest to me as a member of this extremely under-analyzed demographic.

There are many comments that I would love to address here, but the original question is the focus of this post.
We at Trio health are using the OMOP CDM and Atlas tool-set for the study of multiple diseases. Where those diseases are not apparently related to persons who have multiple recorded genders or do not map to ‘M’ or ‘F’ we exclude them.

I have not had the chance to design a study related to conditions arising from hormone replacement therapy and/or sexual reassignment surgery although I have thought extensively about how to approach this within the OMOP & OHDSI ecosystem.
On a personal note: I would love to design a study on mental health-related condition improvements after transgender indicated procedures such as Facial Feminization Surgery, Mastectomy, Sexual Reassignment Surgery, and Laryngeal Surgical Treatment.
For a study such as this, I would follow an approach similar to that recommended in the comments by @ericaVoss and @hripcsa . Although I am not thrilled with the THEMIS verbiage and give extra points to @hripcsa for the use of the term real gender
I would use a dedicated CDM and, probably inadvizably, expand the OHDSI Gender concept to include Assigned at birth designations with a combination F-AMAB or M-AFAB (at a minimum) for those that have changed gender prior to the observation_period.
For those that changed gender during the study, I would record this as an observation.