Doubt on unit_concept_id for adimensional measurement

a.alloni · February 26, 2020, 12:45pm

Hi everybody,
I’m trying to map the concept of DEXA result, which in my source form is coded as a dropdown menu containing the possible outcomes of the densitometry. For our purposes it is enough to know if the outcome was positive or negative, so I mapped this as a measurement with value_as_concept_id as 9189/9191.
I’m asking if there’s an appropriate way of filling in the unit_concept_id to reflect the fact that there is no unit to this measurement, since I could not find a concept for “adimensional” or “no unit”.
I’m asking this because when the quality control is run through the DataQualityDashboard, some “fail” result appear.
Thanks for your help
Best

Anna

Christian_Reich · February 26, 2020, 1:03pm

@a.alloni: If you have no unit the unit_concept_id is NULL. You don’t need a special concept insisting it really is NULL, and it is meant to be NULL, and nobody should bother you because it is NULL.

roger.carlson · February 27, 2020, 1:45pm

For clarity, this is only true for Unit_Concept_ID in the measurement table, correct? In all other concept fields, it should be zero?

a.alloni · February 27, 2020, 2:18pm

Thank you very much for the quick reply!

MPhilofsky · February 27, 2020, 3:54pm

@roger.carlson,

There are a few concept_ids in the CDM that are nullable. The entity_concept_id and entity_type_concept_id fields are never nullable, but the supporting concept_ids can be nullable. value_as_concept_id, operator_concept_id, etc. I would link the relevant specifications here, but GitHub isn’t opening for me this morning.

roger.carlson · February 27, 2020, 4:43pm

I understand that there is an on-going disagreement about this, but according to this CONCEPT_ID (to NULL or not to NULL) [THEMIS WG3], the final determination was that all concept_id fields should be filled with zero EXCEPT:

MEASUREMENT.VALUE_AS_CONCEPT_ID
MEASUREMENT.UNIT_CONCEPT_ID
MEASUREMENT.OPERATOR_CONCEPT_ID
OBSERVATION.VALUE_AS_CONCEPT_ID
OBSERVATION.UNIT_CONCEPT_ID

I don’t really understand why these were specifically exempted when others were not.

Since the original post was about unit_concept_id, Christian’s answer matches my understanding, but I’m concerned it may be construed that all concept_ids can be NULLed.

I keep hearing conflicting information, and I thought the issue was settled.

MPhilofsky · February 27, 2020, 8:18pm

The community has agreed upon this. And the fields you list as nullable are nullable:

concept_id = 0 identifies a source code/value/idea that doesn’t have a matching concept_id.
NULL generally means the data are absent from the source.

The above were exempted because not all data have all these attributes. Regarding required concept_ids, type_concept_ids represent the provenance of your data, your data came from somewhere so it’s required. And if you don’t have a source code/value (entity_concept_id) for a clinical event record, then you don’t have useful data.

Personally, I prefer to leave MEASUREMENT.operator_concept_id NULL when there isn’t an operator present in the source for a record. This identifies the Measurement record as not having an operator at the source. If I were to populate it with concept_id = 0, then I am saying there is an operator available at the source, but there isn’t a matching concept_id in the OMOP vocabulary and it is unmappable. Again, this is my personal preference and an example. This is not an OHDSI convention. And I don’t know if there is a research use case for populating these fields with NULL versus 0.

The specifications inform us which fields are nullable.

Please point us to the conflicting information. There are ongoing efforts to clean up the documentation.

roger.carlson · February 28, 2020, 3:23pm

[quote=“MPhilofsky, post:7, topic:9690”]

MEASUREMENT.VALUE_AS_CONCEPT_ID
MEASUREMENT.UNIT_CONCEPT_ID
MEASUREMENT.OPERATOR_CONCEPT_ID
OBSERVATION.VALUE_AS_CONCEPT_ID
OBSERVATION.UNIT_CONCEPT_ID

concept_id = 0 identifies a source code/value/idea that doesn’t have a matching concept_id.
NULL generally means the data are absent from the source.[/quote]

While NULL may generally mean the data is missing from the source, it actually means UNKNOWN, that is, we don’t know whether or not there is data, let alone what value it might have. NULL cannot ever match anything. I’ve seen the “values of NULL” discussion, and I don’t buy it. You can never say anything positive about NULLs. Once you say something positive about a NULL, it’s no longer NULL. Therefore zero is appropriate because it speaks only to the negative: the field literally does not have a matching concept.

Conflicts

I agree that if you don’t have values for .concept_id or .source_concept_id, you don’t have useful data.

However, all of the “type_concept_id” fields are ARE Required even though the value may not be available in the source.

Condition_occurrence.condition_type_concept_id
Procedure_occurrence.procedure_type_concept_id
Drug_exposure.drug_type_concept_id
Device_exposure/device_type_concept_id
Measurement.measurement_type_concept_id
Note.note_type_concept_id
Observation.observation_type_concept_id
Specimen.specimen_type_concept_id
Visit.visit_type_concept_id

It is perfectly possible to not have a type_concept_id in the Source system. I may not be able to determine whether a procedure is a primary or secondary type, but it is still useful data. But in this case, the field is not NULLable (ie. it is required) and so I have to substitute a zero, even when I don’t know what that value is.

Other concept_ids also ARE Required.

Condition_occurrence.condition_status_concept_id
Drug_exposure.route_concept_id
Procedure_occurrence.modifier_concept_id
Specimen.anatomic_site_concept_id
Specimen.disease_status_concept_id
Observation.obs_event_field_concept_id
Note.note_class_concept_id
Note.encoding_concept_id
Note.language_concept_id

Again, these may not have values stored in the source, yet they are Required. In particular a procedure may not have a modifier at all, so why is the concept required? You may not have information about the anatomic site or disease status of a specimen. These all may have NULLs in the source, but are still required to map to zero.

Then there are the five above which are NOT Required

Measurement.operator_concept_id
Measurement.value_as_concept_id
Measurement.unit_concept_id
Observation.value_as_concept_id
Observation.unit_concept_id

If the reason these five are NULLable is because some don’t have these attributes, that’s actually the opposite of NULL. BMI, for instance, does not have a unit.NULL means ‘unknown’, and we do know whether BMI has a unit. It does not. So putting a zero in the concept_id makes more sense than leaving it NULL. It does not have a matching concept and we know it does not. NULL implies we don’t know.

However, because the unit_concept_id field is NULLable, I can send a measurement value of 100 (for some other measure) without a unit. But is that milliliters or deciliters? From the perspective of someone trying to make sense of the data (ie. a researcher), that particular record is unusable whether there is a zero or a NULL.But if there can be both zeroes and NULLs in that field that means to them the same thing, then it takes extra evaluation.

But then there are some others which are also NOT Required

Observation.qualifier_concept_id
Specimen.unit_concept_id

if modifier_concept_id is required, why not qualifier_concept_id? Or vice versa.

So the decision to make some fields NULLable and some not seems to be exactly backwards to me.

I would argue that ALL concept_id fields should be Required. If NULLs are important, I also suggest that every Concept_id field be accompanied by a Source_Value field (as many of them do already), so if a researcher wants to know the NULL state, he or she can find it in the source value.

Chris_Knoll · February 28, 2020, 8:37pm

Yes and No.
The Yes part is that you could consider NULL to be UNKNOWN such that if I say A AND B if I KNOW A to be true but B is UNKOWN I can not make a statement about the truth or false of A AND B. Therefore,overall result is UNKOWN. This is exactly how SQL traeats NULLS: A AND B where B is null results in NULL.

The No part is that you can actually say something positive about a NULL value. B IS NULL us true. B is not null is false.The other No to your statement is Therefore, zero is appropriate because it speaks only to the negative.... What if I told you that the concept_id for ‘unknown concept’ was ‘12349237’. Would you have a different feeling about assigning the ‘Unknown concept ID’ of ‘12349237’ to a concept_id field? Or do you find a special meaning in 0? IMO: concept_ids are identifiers, and so the fact that 0 is the special number for ‘unknown’ should not be a basis for using it as a concept_id which means unknown.

While you may not be able to determine the _type_concept_id, we’re defining that all observational data must have a type. This is why the field is required and is not null. If you don’t know what it is, put ‘Unknown Concept’. I think this is a reasonable use of the ‘Unknown Concept’.

You list the elements of condition, drug exposure, etc. For example, Drug_exposure’s route_concept_id. It’s reasonable that a drug exposure was administered to a person via SOME route. Therefore, while you may not know what that route is, you have to accept that all drug exposures have a route. Therefore, it can’t be ‘absent’ but it can be an unknown value. So, put the ‘Unkown Concept’ here.

Right, those concepts may not exist for certain types of measurements/observations. This also makes sense:

Consider a strength test where the strength is recorded as ‘Weak, normal, strong’. You’d store the measured value in the ‘value as concept_id’…and the unit_concept_id? What unit is ‘Strong’? (Answer: there isn’t one)… So, NULL for the unit of this measure. Consider another measure of strength of ‘Max Bench’ which is measured in some numeric value with associated unit. Let’s say 350 is the value as number and units is ‘pounds’. This measure doesn’t have a concept, so you put NULL in the value_as_concept_id. Lastly, what if we got the same record, but the source unit was xxxx. Can we map that to anything? No, so you put in Unknown Concept for the unit_concept_id.

I think the value of having a ‘not null’ vs ‘null’ column is to get you to think about how the CDM expects the standardized representation of observation data. If it is NOT NULL, then you need to think about why the CDM requires this information, and how you are going to map your source information over to those fields. I believe the _type_concept fields you mentioned that your source system doesn’t have that information, but I believe those fields are for you to describe the nature of the observation data (is it patient reported, vs. information from a note, vs information from a EHR record, etc). You could leave those as Uknown but you should think about what those values should be/could be, since the CDM is saying that these are required fields.

roger.carlson · February 28, 2020, 10:25pm

@Chris_Knoll

I think we are discussing at cross purposes. I was trying to highlight the confusion around this issue including the meaning of NULL and which fields the specification marks as Required. I think I just made it worse.

So let me say that I agree that the .concept_id, .type_concept_id, and .value_concept_id fields should have a 0 - Unknown Concept value the value is unknown. Even though I may not have the value in my source, conceptually, they all do have values, even if I don’t know what it is.

Likewise, I agree that things like route_concept_id, anatomic_site_concept_id, note_class_concept_id, and the like should also be required and given a value of 0 - Unknown Concept if you don’t know what that value might be (or can’t infer it somehow). A drug has to be given by some route. A specimen must come from some anatomic site. And so forth.

However, where we disagree is with the five concept_id fields which are NOT required.

I can’t see any observational value in putting a NULL in a Unit_concept_id for a measurement which does not by definition have a unit. The unit is not Unknown. We Know that there isn’t one. Someone looking at a strength test measure is not going to look for a unit, because they already know that there isn’t one.

Putting a NULL in that field only confuses the issue. Someone querying all the measurements and the units for a person, MUST use an OUTER JOIN to the Concept table for the Units, but can use either an INNER JOIN or an OUTER JOIN for the measurement_concept_id. If they don’t, none of those non-unit measurements will appear in the resultset. On the other hand, if those concept_ids have the 0 - Unknown Concept, they can reliably use an OUTER or an INNER join for either field.

By the way, there are 3 other fields which also fall into this category, but their ‘nullability’ (for lack of a better term) differs:

Required

Procedure_occurrence.modifier_concept_id

Not Required

Observation.qualifier_concept_id
Specimen.unit_concept_id

So to sum up, I contend that ALL concept_id fields should be required.

Christian_Reich · February 28, 2020, 11:06pm

@roger.carlson:

How about this:

All fields representing the normalized content the record (e.g. drug_concept_id): Not nullable. Because if there is a record, we know there is a value, we just don’t know which.
All fields representing auxiliary facts, which may not exist (e.g. modifier_concept_id): Nullable. Because the existence of the record does not say anything about the existence of the auxiliary fact, and we need to be able to express that there is none.

I understand your problem if you want to report that, and a simple inner join would make the record disappear. Got to use outer join. Is that a problem? If a field is nullable use outer, otherwise inner (or also outer).

Chris_Knoll · February 29, 2020, 1:06am

Maybe a source of confusion is the dual-nature of the Measurement table: it is used to store measured values (with numbers and units) and categorical values with categorical concepts.

What if the categorical type of observational data was put into observation (with value as concept for those cases where the observation can be qualified) and quantifiable observations goes into measurement (where we can then say, because it’s quantified, value_as_number and the unit the number are required and not null.

that would give some clear separation between what each table represents and would put to rest a few of the concerns about Unknown Concepts.

roger.carlson · March 2, 2020, 2:33pm

Yes, I think that might be a problem. It assumes the person pulling the data , 1) knows about the database structure, 2) knows that nullable fields require an outer join, 3) doesn’t make a mistake in the query. My understanding is that one purpose of a CDM is to minimize the amount that has to be known about the data structure.

Another way to look at it is: What are the consequences?

With nullable concept_id fields, the consequence is possible loss of data due to the factors above.,

With non-nullable concept_id fields (that is, having a ‘0 - Unknown Concept’ value) the consequences are…what exactly?

In the case of measurements that by definition do not have a unit (like BMI), no one will expect to see a unit. ‘0 - Unknown Concept’ is what they’d expect. Is there any sort of evaluation where you might compare the units of different measurements, some of which have units and some don’t? It there any situation in which a unit of ‘0 - Unknown Concept’ is different than a value of NULL? (From a research perspective, that is.) I can’t think of one.

Likewise with Value_As_Concept_id. Is there a situation in which there is an important difference, in the context of the data, between a value of ‘0 - Unknown Concept’ and NULL? Is it important to know WHY there is no value_as_concept? Or is it just important to know that there isn’t one?

So my question is, under what circumstances would we need know WHY an auxiliary fact does not exist?

If there is no use case for that, then I think the possibility of data loss is much more important.