I’d disagree with this assessment: null is, by definition, the absence of value. An empty strings is a value of a string, in as much as it has a length, you can compare it to other values, etc. Having an invalid reason of “” doesn’t mean it doesn’t have an invalid reason, it means it does have an invalid reason, and the reason is {silence}. Conversely, a NULL value in invalid_reason means there is no invalid_reason.
On the note of joins, it makes sense that you wouldn’t want null columns to be joined together. The idea of a join is that you are matching a value from one table to a value of another. No value, no join. It also follows that LEFT JOINs that do not have a match return a NULL value and not an empty string.
Concept_Ids are ints, so can’t store a “” value. You might have noticed that we provide a CONCEPT_ID = 0 for ‘unknown concept’ but this value demonstrates the difference between ‘absence of value vs. the presence of a value’ beautifully: In the MEASUREMENT table, we might see a row with ‘value_as_concept’ which sometimes might be 0, other times might be null. 0 means that there was a value in the source system but couldn’t be mapped to a CDM concept. a null means that this measurement has no value for a value_as_concept. Measurements that have non-null value for value_as_concept might be a ‘pass’ or ‘fail’ value for a test, while a null valued row is the type of measurement where the existence of the record means something was identified (such as an influenza virus was identified).
I hope this helps!
-Chris