Laboratory and vital sign results-numeric and non-numeric

fitznmn · April 25, 2015, 11:01pm

We have laboratory and vital sign results that could be either numeric or non-numeric for the same items. For example the laboratory result might be intended to be numeric but for very small values it would be resulted as “<2”. The symbol makes it a non-numeric result. For vital signs height might be 63 or sometimes entered as 5’3’’. Again the symbols in the latter observation make the result non-numeric.

We need to carry the numeric and the non-numeric source field into OMOP so that the researcher can decide what they want to do with the outlier results that were expected to be numeric but became non-numeric for whatever reason.
Fern

Christian_Reich · April 26, 2015, 3:56am

@fitznmn:

Not quite. In V5, you have the operator_concept_id field in the MEASUREMENT table which allows you to put in operators. For example, for “<” you would use concept_id 4171756.

Correct, but in reality it is 63 inches. There is no good reason not to convert it during ETL to that.

In general, we are trying to free the researcher for those data manipulation tasks. The researcher should worry about patients being above or below a value. 5’3" are 63" are 160 cm, and a person of that height has that height. There is no analytical value in preserving the original recording transaction.

fitznmn · April 26, 2015, 2:25pm

Christian
Thanks for responding so quickly to our plan.

Daniella asked me to post it when I brought it up at our data harmonization meeting. You are correct that with infinite resources all such problems can be cleaned up. However at the VA, our resources are finite and at this time, we can only load the pure numeric data. For researchers to use the data, they need to plan for how to deal with the reality of the data given a system that does not have the resources to do a complete translation with programming. That is why we plan to carry the source for both fields. Even if the translation had been done, there still can be questions since it is not exactly representing the source-assumptions were made. That is a good reason to carry the source as well.

Thank you for the feedback!
Fern

bailey · April 26, 2015, 8:08pm

@fitznmn We’re coming from EHR source data as well, so also have the joy of seeing almost every variation on syntax we can imagine. In situations like you’re describing, our approach has been to put the original value into the appropriate source value field (looks like value_source_value for this case), then devote “reasonable” resources to translating the value into the forms expected in the canonical fields. For the cases that are beyond “reasonable”, we leave nothing in value_as_number. Our thinking here is that we preserve the desired structure for “standard” analytic tools that rely on it, but preserve the ability for the interested user to delve into the source values to disambiguate or to translate at some higher level of “reasonable” (and then tell us how, we hope).

We’re of course making two big assumptions here: that a “standard” analytic program will behave more nicely when faced with a missing value than it will when faced with a value it doesn’t understand, and that dropping measurements (due to missing values) in an analysis is no worse than encountering a value the analysis can’t decode. Neither is a given, but both seem to be reasonable, at least so far.

wstephens · April 26, 2015, 11:39pm

Fern,

Populating base data, loading numeric values
Assuming that you are using the measurements table, I do the following:

Place all raw values (“11”, “120”, “<2”, “5’3"”) from your source system into ‘value_source_value’ as strings
Run SQL to populate the ‘value_as_number’ column if the ‘value_source_value’ is actually numeric (PostgreSQL example)

– set numeric values to value_as_number
update measurement
set value_as_number = CAST(M.value_source_value AS numeric)
from (
select distinct measurement_id, value_source_value, value_as_number
from public.measurement
WHERE value_source_value IS NOT NULL
AND value_as_number IS NULL
AND isnumeric(value_source_value)
) M
where measurement.measurement_id = M.measurement_id;

Mixed Unit of Measurements
Concerning height as 5’ 3", the source system is recording multiple measurement units into a single value. I prefer to create measurements that use a single unit of measure, leveraging metric units where possible, for my clients. This is the only way that we can conceptually map the measurement units into unit_concept_id.

Operators
operators such as <, >, >=, + cab be a problem. You can conceptually map the operator into operator_concept_id, but you cannot place it in a separate field using CMD v5 as there is no operator_source_value.

Bill

rwpark · April 27, 2015, 1:51am

Just treating them as missing seems very dangerous.
Values below or above measurement limits are usually very important, clinically.

How about to use top-coding for the values above or below the limit of measurement?
For example, <2 is replaced into 1.99, >1000 into 1001.

bailey · April 27, 2015, 3:16am

@rwpark Agree that for specific analyses limit-coding or other out-of-band values can help point analyses in the right direction – I’ve found figuring out the right value for the general case to be difficult. At a minimum, the presence of a measurement with a measurement_concept_id that “should” have a numeric result, and an absent result, might provide a cue to the reader to look for an anomaly by consulting the source value.

Christian_Reich · April 27, 2015, 4:06am

@rwpark and @bailey:

The problem is the following: My hunch is, no analytical method will “consult the source value”, unless the analyst happens to be also the one who ETLed the data and does some manual work. Instead, your typical query will be something like “SELECT * FROM measurement WHERE measurement_concept_id = <some test concept> AND value_as_number < 2”, ignoring the operator_concept_id. Rae’s solution of putting a value of 1.9999999 would work here. But it would fail if the method asked for patients who, say, are not <1.5. In such a situation a value of 1.999999 would look like it fits the clause, while the correct <2 does not.

What we should do is to provide a standardized query that takes the operator_concept_id into account correctly.