So I just checked TruvenCCAE (v697) for counts of rows that have 0 as value_as_concept, and here are the top 5:
0 drinks 2584288
Very good 1607292
So we see here, NULLS outpace the next unmapped source value of 'No' by a factor of 200. What query were you using to determine that there were rarely null values from the source? Maybe I'm looking at the wrong DB.
But that's the job of the ETL logic, no?
All I want out of this Themis discussion is that when I look at a row, I can confidently say 'there was no value in this column from the source'. I do not want to have to dig into ETL rules about how many different ways you can represent the absence of values (ie: '', 'NULL', 'NA', 'N/A', 'Not Provided', 'NP', etc. the list goes on, and if there is a case where I want to analyze a data source on their capture of a measurement value that should have been stored as a categorical value (mapped to a concept) I'd just like to look for those that are not null vs null so I can understand the capture vs having to figure out if the value was provided or not through interpreting the value_as_source. And my interpretation on one source will not follow through to another source (unless it's standardized via a Themis statement of standardization).