Mostly
You forgot to mention my solution allows for provenance of the records via the observation_type_concept_id field.
And now is a good time to mention my implementation plan, found on slides 8 - 12.
Before we introduce breaking changes to the CDM and remove the race & ethnicity concept_ids from the Person table, I suggest we make a convention to encourage and allow the use of observation_concept_id = “Has race/ethnicity” to the Observation table. To allow these data to co-exist in both tables until next major/breaking change CDM release. Yes, this will denormalize the CDM, however, it will give us some time to test drive this solution and update cohort definitions before going all in with removal of these data from the Person table. I spoke to @Chris_Knoll at the Symposium and he doesn’t have any concerns about this change for Atlas. Chris suggested I talk to @schuemie, so I pitched it to him. Cohort definitions will have to be updated. Clear and concise documentation on how to ETL the data and how to use the data will be given by Themis & the CDM WG.
Once ETLers have implemented this change, we will need feedback from them on 1. Were the instructions on ETLing these data clear? 2. What are your pre & post change mapping rates? 3. What’s still not mapping?. Next, we’re going to need feedback along the same vein from the analysts: 1. Which use cases now work? 2. Which use cases don’t work? 3. What’s missing?
I am coming at this from the Themis point of view with a strong Health System Interest Group influence. I’d like all of us to keep Themis’ mission statement in mind as we discuss this topic, “Themis makes decisions for the good of the whole community. We must compromise. We can always revisit and modify the convention. Don’t let perfect be the enemy of great. And interoperability between different OMOP CDMs is great!”. I’ll admit, it’s a little cheesy, but we really need the community to follow the standards. You can always add additional fields to your CDM, but need to populate the CDM as expected or we can’t do federated research. And we must comprise, agree to disagree, and move forward. The race topic has been going in circles and infinite loops for years.
With this in mind, I propose we defer the flavors of NULL (unknown, not answered, etc.), hierarchies, and negative values to a future iteration unless there is a strong use case. These items will be easy to add in later, if needed. Let’s use the data with the new concept_ids, run it through some use cases and research, identify areas needing improvement, regroup after running it through the rounds, and then make a plan. Let’s keep it simple and pragmatic for our first implementation.
To echo what @aostropolets said, regardless of which proposal or combination of proposals the community adopts, we need to broadcast to all including: OHDSI chapter leads/WGs, those about to ETL their data, those using the CDM including secondary research groups N3C, All of Us, etc.
Since this is such a huge change and will affect many in the OHDSI community from the ETL through the pipeline to the researchers and the tools used, once a decision has been made, I suggest we form a sub-working group to document and implement the change requested by the community.