How to implement a lab test range result in Measurement table?

QI_omop · October 16, 2018, 11:49pm

In our dataset, there is a Urine WBC Count lab test that needs to be loaded into CDM Measurement table. The lab test result is not a number but a range, e.g., 20-50 /HPF. The Measurement table has 2 fields to hold lab test result: value_as_number and value_as_concept_id. I don’t see how this result can go into either column. I do find a concept_id (36309070) in ‘Meas Value’ domain with concept_name as ‘20/50’. But I think 20/50 means 2 numbers as in systolic/diastolic blood pressure, which will not apply to this case. So please advise. Thanks.

aostropolets · October 17, 2018, 1:30am

Good question, made me think for a while.
Apparently, we can’t store ranges in Measurement now, only in Observation table in value_as_string. The possible explanation is that measurements should have reliable results that are actually measured. In this case, nobody counted leucocytes: for decision making, we are fine with an approximate number rather than a precise one.
There are 2 possible ways to handle that: either to create 2bil+ custom concept ‘20-50’ or to put average in value_as_number (25 / HPF); I’d suggest the latter.
@Christian_Reich would probably argue for a Themis convention or new fields for ranges though

rimma · October 17, 2018, 2:11am

No, we don’t have ranges. I think this as omission because a lot of measurements come as ranges. These are neither strings, nor concepts. They are two easily analyzable numeric values, and they need their designated representation in the model.

On the other hand, we have low and high level range fields that are created to store normal ranges for a specific measurement. Are these being used at all? Could we re-purpose those for the actual measurement ranges?

I do not recall history or reasons behind the current design. Are there any existing conventions for representing ranges?

@Christian_Reich?

aostropolets · October 17, 2018, 2:32am

The ranges are used to group measurements into groups high/normal/low in covariates, so we need those fields as they are.
And how often do you see ranges in your source data? Is it a widespread issue?

QI_omop · October 17, 2018, 3:17am

@aostropolets The range count is quite wide spread in our source data. All following lab test result gives range count instead of a number:

URINE CASTS
URINE COARSE GRANULAR CAST
URINE FINE GRAN CAST
URINE HYALINE CAST
URINE RBC
URINE RENAL CELLS
URINE SQUAMOUS CELLS
URINE TRANSITIONAL CELLS
URINE WAXY CAST
URINE WBC
WBC STOOL

Now that I look at these, they are all about cell count either in urine or stool. So maybe they should go to Observation instead of Measurement? After all, these count are obtained by ‘Observing’ under microscope.

roger.carlson · October 17, 2018, 12:15pm

I’m trying to imagine how a range like 20/50 would be useful as a measurement. Unless they are standards for some type of test (in which case they should have a concept), there’s no way to compare your own data with someone else’s. So if your data says 20/50, and mine says 30/50, are these the same or different? If the actual count was 35, they would be the same. If it was 25, they would be different.

If actual numbers are stored as the measurement, I can easily assign ranges for my own analysis.

I’m thinking here not as a clinician, but as a database guy trying to pull meaningful data. I can envision a scenario where someone wants to query for a range of 20-60, and I’m left trying to figure out what fits using string and data conversion functions.

Christian_Reich · October 17, 2018, 6:00pm

He would.

Ranges are there for normal ranges. Not for ranges of measurements. Usually, measurements are mostly single numbers.

Now, @QI_omop has a use case with ranges in the result. That happens when the measurement cannot give a precise answer.

@QI_omop: do you think you can work on a proposal for the CDM? Not sure it will pass, but it is a valid point.

You usually are not comparing numbers, and which is bigger. Typically, you are asking for patients where a measurement is above or below a certain threshold. Like in “Give me all patients where the leuko count in the urine is above 10 per HPF”. In that case, @QI_omop’s record would fit. If the question were “Give me all the patients above 30” or “Give me all the patients above 70” it would not match.

But if you want to compare: You would essentially calculate whether or not the ranges overlap. When they do the result is “the same”, if they are disjoint than not.

amy_chuang · October 17, 2018, 6:11pm

I have a question about measurement table -
Should Lab test records added to measurement table because their domain_id is ‘Measurement’? For example:

Concept id = 3029925 Color of Urine by Auto
Concept id = 3007876 Appearance of Urine
concept id = 4047167 Chlamydia trachomatis culture

aostropolets · October 17, 2018, 6:15pm

Exactly. The records go to the tables that correspond to their domains.

why wouldn’t it? The range is 30-50, so 30 works.

Sounds good. Meanwhile, as a temporary workaround, the average value/new concepts can be used.

Christian_Reich · October 17, 2018, 6:32pm

@QI_omop had 20-50.

amy_chuang · October 17, 2018, 6:49pm

Thanks @aostropolets for your quick response. Further question using these three concepts as example-
In our source system, we have example results and range_low as follows:
result = Negative; Range_low = Negative
result like Hazy, Cloudy, Clear, Turbid, Slightly Cloudy ; Range_low = Clear
result = Yellow; Range_low = Yellow

Two problems in current CDM for measurement with these data -

No value_as_string filed
Data type for Range_Low is float

We do the following adjustment in order to accommodate these data -

Add value_as_string field
Modify data type of Range_Low to varchar
Please advice if there solution other than the way we implement.

limadm · October 17, 2018, 7:59pm

Hello,
Please notice that representing measurements as value ± uncertainty may be useful for some statistics and algorithms, e.g. similarity queries.
Think of three patients with measurements 20-25, 20-50 and 30-50.
If we transform them to 22.5 ±2.5, 35 ±15 and 40 ±10 respectively, using the lower component of the range (20, 20, 30) places 1 and 2 closer (differences 0 and 10), while the mean (22.5, 35, 40) would place 2 and 3 closer (differences 7.5 and 5). Maintaining this information is relevant for queries like “Give me all the patients with lab results closest to this one (the 30-50)”.

rimma · October 17, 2018, 8:56pm

@aostropolets, @Christian_Reich
I am still not sure I understand the current population of the lower-upper normal range values. Are these fields populated post-ETL?

@aostropolets
Other examples of cancer/pathology/radiology data that are expressed as ranges:
percentage of positive receptor cells
tumor size/dimensions
number of tumors

@roger.carlson
Ranges are very useful even if they don’t match or overlap. In your example, 20/50 and 30/50 are still comparable for the following assessments: within 20/50, above 30, below 50. I think ability to store this information is critical, I’d support the proposal.

I also support the need to record measurement error @limadm proposed.

@amy_chuang
negative, positive, cloudy, clear, etc. are lab test qualifiers. We usually map them to SNOMED qualifiers. For example, Cloudy is SNOMED concept (concept_code 81858005) with a corresponding OMOP concept_id = 4219220.

The problem you are addressing of making two of the qualifiers into bottom and top values of the range can be resolved by assigning numeric values to each qualifier by their level of severity:
Clear = 1
Slightly Cloudy = 2
Cloudy = 3
Hazy = 4
Turbid = 5
Range_low = 1
Range_high = 5

This means that you will need to store value_as_concept_id along with value_as_number. Not sure, if this violates any conventions. This representation applies to any ordinal scale that has respective concepts.

esholle · October 17, 2018, 9:30pm

You usually are not comparing numbers, and which is bigger. Typically, you are asking for patients where a measurement is above or below a certain threshold. Like in “Give me all patients where the leuko count in the urine is above 10 per HPF”. In that case, @QI_omop’s record would fit. If the question were “Give me all the patients above 30” or “Give me all the patients above 70” it would not match.

This is exactly what I was thinking. In assessing response in liquid tumors, one of the most common criteria is whether or not the blast count in a bone marrow biopsy is above or below a given threshold. We get blast counts out of the bone marrow biopsy using NLP - but they’re sometimes expressed as a range, and, as @Christian_Reich mentioned, what’s important is just whether or not the patient is below or above the maximum (or minimum) of the range. So if you’re looking to see if the patient has ≥20% myeloblasts, a range of 10-25 wouldn’t qualify.

I wonder if you could do this using the OPERATOR_CONCEPT_ID - but then you’d need two rows, one for the “less than” and one for the “greater than.” My sense is that having two rows with the same measurement_id is a violation of the DDL - @Christian_Reich, is that a fair statement?

aostropolets · October 17, 2018, 11:22pm

It’s not a direct violation, but:
Say we have range 20-50 and put it as 2 rows >20 and <50.
Then the patient will qualify for the cohorts
“Give me all with more than 1000” and “Give me all with less than 10”, which is apparently wrong.

QI_omop · October 18, 2018, 1:30am

@Christian_Reich. Sure. I will work on a proposal for the CDM.

Dymshyts · October 18, 2018, 11:38am

And here is another idea for a range representation:
Obviously when the range is given it says about some distinct result like “Normal”, “Slightly increased” and so on.
Can it be translated during the ETL process, I mean investigate what’s 20-50 is for you test and put interpretation in value_as_concept_id

And also I like the idea

Chris_Knoll · October 18, 2018, 3:27pm

I agree with @aostropolets: splitting linked information across 2 records is a bad idea. If you are saying there’s a low value and a high value for a single measurement, you should have that information in the same record.

From the CDM Spec, the existing range_low and range_high is about what the ‘normal’ range is for the given patient, and not the range of values measured. So, sounds like we’d want two new columns: lower_bound_value_as_number (or lb_value_as_number) and an upper bound (ub_value_as_number). I think you’d also want to define up front that the values are inclusive/exclusive so that you know when you see a lb of 20 and an ub of 50 it is 20 <= m <= 50 vs. 20 < m < 50 (otherwise add yet another column to store if those ranges are inclusive, or if the upper bound is inclusive but the lower bound is not…yuck!)

If there’s a range of 10-25, doesn’t that mean that something was measured above 20, and therefore looking for a value >= 20% would match this patient?

This is also a possibility too: you’d only add one column in this form, interval which stores a number which can be added/subtracted from the value_as_number to get the range. Exact values would store an interval value of 0.

To find the patients, you would just unwrap the interval into a range, and look for overlapping ranges between the values in the measurement. I still need to understand why an interval of m >= 20 does not overlapp with 10 <= m <= 25 in @esholle’s example.

hripcsa · October 18, 2018, 4:18pm

I think for actual research, the only thing that will get used is putting the middle of the range into the single value column, and–to preserve information–put the full range into the source column. Yes, we can add an uncertainty column to formally support the range (±2.5), but I doubt anyone will use it. Also remember that all single values have ranges; we are just not explicit about it.

If we use OMOP to support individual patient guidelines, then you probably want to be more explicit about the range in deciding if someone qualifies. But even there, guidelines don’t specify whether you should be more specific or more sensitive (if the guideline has a 20% threshold, then is 10-25 included or not; it depends).

So in the end, I am thinking the middle of the range is about as well as we will do.

limadm · October 18, 2018, 10:00pm

Hi @hripcsa, I did not understand your point.
Aren’t range_low/range_high columns meant to document the measurement domain?
I think a measurement error is a different variable.

Example: I am sure a 201mg/dL blood glucose level is hyperglicemic (assuming the normal range to be 70-200), but not so if the glucometer has a ±20% error. If the measurement error is not considered, and the caretaker decision in cases like this is always “hyperglicemic”, we can expect a significant amount of normal persons “at the edge” being diagnosed as hyperglicemic (at least 48% if the error has uniform distribution). Using the interval minimum, maximum or mean does not change this expectation.

But as always, considering this information is a possibility.