OHDSI Home | Forums | Wiki | Github

How to represent ranges in the MEASUREMENT table?


(Michael Gurley) #1

If I have a lab value in a source system that contains the following categorical value:

‘Described as “less than 2 cm,” or “greater than 1 cm,” or "between 1 cm and 2 cm’,

What is the convention or best practice for representing this lab in the MEASUREMENT table? One entry with value_as_concept_id set to a categorical or somehow making two related entries within MEASUREMENT setting value_as_number/operator_concept_id for the lower and upper bounds?


(Christian Reich) #2

@mgurley:

Easy. You put in 2 or 1, and cm into the concept_as_value and unit_concept_id. And in the operator_concept_id you put in < or >.

We need to amend the model. I thought @Dymshyts either did write or will write a proposal, but you could do that just as well. We have a couple options:

  • Record both “less than 2 cm,” or “greater than 1 cm" independently as two records. Problem is that ordinary querying will miss that, because it will not realize that both records belong together. So, you could add a new FACT_RELATIONSHIP record. Ugly as sin, and will probably also be overlooked.
  • Amend the table to have another field for a second value. Ugly as sin and will be overlooked.
  • Amend the table to have an error range, and then record the mean. So, you would record 1.5 cm and .5 range. If you want to look for precise values you’d have to not overlook there is a range. If you want overall averages this would just work out of the box.

I am in favor of the last, but I am sure we will have a lively discussion in the community.


(Dmytry Dymshyts) #3

it is kind of joke, right?

Yeah, the latest seems to be the best one, but still need to think about options.
@Eldar what do you think about measurement results as covariates in the latest scenario?


(Christian Reich) #4

I don’t get it. What’s the joke?


(Chris Knoll) #5

How we model these kinds of criteria in cohortbuilder is that we have a value and extent field. In the case that we say between A and B, the value is A and extent is B. If it is greater than A, we use simply the Value as A and operator is ‘GT’.

I think if the data can have an element of a ‘range’ of values, then I don’t think there’s anything wrong with modeling that in your data-structure.

However, I will say in the case of measurements, this becomes a difficult thing to query: if the criteria says ‘between 10 and 50’ and the value says the value is between 9 and 51, the criteria doesn’t completely cover the range of values, so in that case, should the record be selected? Or if the value is between 9 and 51,a and I look for values ‘greater than 20’, is between 9 and 51 a selectable record with ‘greater than 20’ criteria?


(Christian Reich) #6

@Chris_Knoll:

I think you listed all the situations. Essentially they are:

  • Give me patients with value greater than A (cohort definition)
  • Give me patients with value between A and B (ditto)
  • Give me the average of all values in a population (data characterization)

The question which is the easiest way to support these, and of the alternatives which is the most backwards compatible one which is also compact (we don’t want an inflation of all sorts of new fields). My proposal is the A plusminus error. I can be convinced otherwise, though.


(Dmytry Dymshyts) #7

Technically “less than 2 cm,” or “greater than 1 cm,” means every numeric value (it’s not “less than 2 cm,” AND “greater than 1 cm)
So, I decided that you want to make a new concept for this non-sense “2 or 1”, anyway, if it would be a joke, it would be not a funny one :slight_smile:


(Michael Gurley) #8

@Dymshyts As you might have guessed, this example comes from NAACCR. So it is less than perfect: :roll_eyes:

NAACCR #2800 CS Tumor Size

See here for its origin:

https://staging.seer.cancer.gov/cs/input/02.05.50/prostate/size/?breadcrumbs=(~schema_list~),(~view_schema~,~prostate~)

I think it is poorly worded and actually means “between 1 cm and 2 cm”. So I don’t think it is trying to represent all numbers.

As you can see from the link, this is a question that has a mix of of categorical and numeric answers.

Code Description
000 No mass/tumor found
001-988 001 - 988 millimeters (mm) (Exact size to nearest mm)
989 989 mm or larger
990 Microscopic focus or foci only and no size of focus given
991 Described as “less than 1 centimeter (cm)”
992 Described as “less than 2 cm,” or “greater than 1 cm,” or “between 1 cm and 2 cm”
993 Described as “less than 3 cm,” or “greater than 2 cm,” or “between 2 cm and 3 cm”
994 Described as “less than 4 cm,” or “greater than 3 cm,” or “between 3 cm and 4 cm”
995 Described as “less than 5 cm,” or “greater than 4 cm,” or “between 4 cm and 5 cm”
999 Unknown; size not stated Size of tumor cannot be assessed Not documented in patient record

When NAACCR #2800 CS Tumor Size has a categorical value as its answer, I think we should give up on trying to represent its answer numerically in the MEASUREMENT table and simply record its value/answer in the value_as_concept_id column. When its answer is between 001-988, I think we should represent it as a number in the value_as_number field.

If we try to represent categorical values like ‘992’ numerically, we will need to manually curate/annotate the NAACCR vocabulary to an unmanageable degree. Especially over time.

Further, these kind of edge case answers seem to fall into uncertain cases where likely the value of the underlying data is highly suspect.


(Dmytry Dymshyts) #9

I treated

and others like this as “between 1 cm and 2 cm” only.
So, yeah, maybe you’re right: instead of interpeting them, we need to put them as is (as a value_as_concept), and let somebody else interpret this kind of values. Right?


(Michael Gurley) #10

@Dymshyts
Ok, so in this case if the answer was 001 to 988, we could put it in MEASUREMENT.value_as_number. Else in value_as_concept_id.

I am sure there are scenarios where source systems or vocabularies truly represents lab values as a range. In those scenarios, I can see the need for some kind of solution/convention to represent numeric ranges explicitly within the CDM:

  • Two entries in MEASUREMENT.
  • Add a column to MEASUREMENT.
  • Record the mean.

But if the source system or source system vocabulary has not put in the effort to make this explicit, I don’t think we should either.


t