As part of THEMIS we are looking to develop conventions for handling invalid/negative values in Lab test records. Looking for input from the community on any existing conversions and options to handle invalid values. Have you come across this issue? How do you address that?
As far as I understand you mean values that are definitely incorrect.
There are the following options:
Option 1: Keep the records. Put the results āas isā in value_as_number field.
Advantages: We do not loose records
Disadvantages: Data does not look clean (especially if a decent part of data is of such kind)
Option 2: Keep the records. Populate NULL instead of incorrect values.
Advantage: We do not loose records. We store the fact that a test was done.
Disadvantage: We loose source values.
Option 3: Throw out such records.
Advantages: Only good test results are stored. Data looks clean.
Disadvantages: We loose records that someone may need
Option 4: Flag incorrect results someway. Put the incorrect result to value_as_number as is.
And populate measurement.value_as_concept_id (or observation.value_as_concept_id) with a concept representing incorrect result
45884071 - Incorrect test results
OR
45876576 - Unknown (missing)
Advantages:
- We do not loose records (the facts that a test was performed)
- We mark incorrect result values, so it is easy to filter them out if needed
Disadvantages: - āIncorrect test resultā may coincide with result interpretation already presented in source data, so this flag will not uniquely identify this group of records
- This is not a result interpretation (High/Low etc) stored in source data but this is an ETLerās interpretation derived from source data
- There could be a conflict if source data already contains a result interpretation
Probably someone could have a better idea how to mark such records.
Any more ideas, comments?
Iād go for the second option for the following reasons:
- We still know that a person has undergone a test
- The wrong value doesnāt interfere with the mean result when we do a statistic analysis. Moreover, if we want to see all the patients with a lab result below a certain value weāll also get those patients while we do not actually know whether they belong to this cohort.
Option #4 can also work but we make it harder for researchers to keep in mind all the possible flags.
Tagging @ericaVoss as sheās interested in low-quality records.
Agree with @aostropolets. Plus, if the negative value indicates revoking of a previous order, we should remove the records.
I vote for option 4.
But use a more unique and unambiguous concept to reflect that an ETL action was done. (ETL data enhancement step)
If we make the next step in converting compatible units to a single target units - we will need a similar mechanism that an ETL action on a data row was done.
E.g., the LDL value was converted by ETL to the the preferred single target unit.
See possible examples
here https://github.com/OHDSI/StudyProtocolSandbox/blob/master/themis/extras/partial_results/C-tests-aggregated.csv#L389
(hemoglobin row)
Marking implausible values during ETL is similar to unit conversion during ETL.
ā¦and finally a teaser: measurement.source_value_as_number (CDISC SDTM format takes this approach)
Friends: Can we not just āvoteā, but actually produce use cases? What is the use case for having an incorrect result? I am at a loss.
Arenāt there some lab results that can be negative?
1925-7-Base excess in Arterial blood by calculation
RANGE: mmol.L:[-2,+2]
https://s.details.loinc.org/LOINC/1925-7.html
1927-3-Base excess in Venous blood by calculation
RANGE: mmol.L:[-5,+5]
https://s.details.loinc.org/LOINC/1927-3.html
Good Catch @ericaVoss.
@mvanzandt and @Christian_Reich, Can you please add Ericaās point to the deck.
Is there validity test we can run in ETL to check for valid range from LOINC? We canāt just remove/flag negative values if they are within valid range.
Is there validity test we can run in ETL to check for valid range from LOINC?
Pretty sure that there isnāt unless you do some NLP with LOINC webpages to get the range.
We can make a list of exclusions for which this rule wonāt apply.
Reading through this post again and summarizing what was said at the THEMIS F2F.
RECOMMENDATION
Only allow values for negative measurements. If you have negative values for positive measurements, then set it to 0.
ACTION
-
Work to develop an exception list (e.g. 1925-7-Base excess in Arterial blood by calculation, RANGE: mmol.L:[-2,+2], https://s.details.loinc.org/LOINC/1925-7.html)
-
Update ACHILLES to do a warning on non-exempt tests that are negative.
-
Work with CDM WG to document both the exception list and above text in the VALUE_AS_NUMBER column. https://github.com/OHDSI/CommonDataModel/wiki/MEASUREMENT
Does anyone have a list of the LOINCs that can be negative? We found 2 so far:
1925-7-Base excess in Arterial blood by calculation
RANGE: mmol.L:[-2,+2]
https://s.details.loinc.org/LOINC/1925-7.html3
1927-3-Base excess in Venous blood by calculation
RANGE: mmol.L:[-5,+5]
https://s.details.loinc.org/LOINC/1927-3.html1
ThemisUnits and ThemisMeasurement study can be extended to go after all negative value tests.
Also, who is tasked with implementing that in Achilles. I am happy to volunteer.
Donāt understand. You want to query the data whether they contain negative results? Why? They are wrong in 99% of the cases. Junk. We need to decide that deterministically.
Thanks so much, @Vojtech_Huser. We need to give you some credit for pushing this into Achilles.
A third exception would be
LOINC, NAME
11555-0 Base excess of blood
We also see negative values in non LOINC coded tests.
QRS-Axis (that would most likely be LOINC code 8632-2 )
Sorry for confusion - Yes - I did mean deterministic decision.
the first thing coming to my mind is to take a look on the range defined by the LOINC, this way I got this list
11555-0 | Base excess in Blood by calculation | mEq/L:[-2,+3];mmol/L:[-2,+3];
1927-3 | Base excess in Venous blood by calculation | mmol.L:[-5,+5]:
1925-7 | Base excess in Arterial blood by calculation | mmol.L:[-2,+2];
1926-5 | Base excess in Capillary blood by calculation | mmol.L:[-2,+2];
28638-5 | Base excess in Arterial cord blood by calculation | mmol.L:[-10,-2];
28639-3 | Base excess in Venous cord blood by calculation | mmol.L:[-10,-2];
But this approach misses the cases where the range is null in LOINC, the
is the good example.
@Vojtech_Huser, how did you find the QRS-Axis thing? just looking on your data?
Updating my post based on the above notes.
RECOMMENDATION
Only allow values for negative measurements. If you have negative values for positive measurements, then set it to 0.
ACTION
Work to develop an exception list (e.g. 1925-7-Base excess in Arterial blood by calculation, RANGE: mmol.L:[-2,+2], https://s.details.loinc.org/LOINC/1925-7.html)
Update ACHILLES to do a warning on non-exempt tests that are negative.
Work with CDM WG to document both the exception list and above text in the VALUE_AS_NUMBER column. https://github.com/OHDSI/CommonDataModel/wiki/MEASUREMENT1
Does anyone have a list of the LOINCs that can be negative? We found the following so far:
-
1925-7-Base excess in Arterial blood by calculation
RANGE: mmol.L:[-2,+2]
https://s.details.loinc.org/LOINC/1925-7.html3 -
1927-3-Base excess in Venous blood by calculation
RANGE: mmol.L:[-5,+5]
https://s.details.loinc.org/LOINC/1927-3.html1 -
QRS-Axis (that would most likely be LOINC code 8632-2 )
-
11555-0 | Base excess in Blood by calculation | mEq/L:[-2,+3];mmol/L:[-2,+3];
-
1926-5 | Base excess in Capillary blood by calculation | mmol.L:[-2,+2];
-
28638-5 | Base excess in Arterial cord blood by calculation | mmol.L:[-10,-2];
-
28639-3 | Base excess in Venous cord blood by calculation | mmol.L:[-10,-2];
Is that what we said? A value of 0 mg/dL Glucose in plasma is hardly a useful record. The patient would look pretty miserable with that value. Shouldnāt we set it to NULL?
Friends: Where are we on this? Should we start publishing things? The better list is the enemy of the current list.
On Achilles implementation, I am waiting on Ajit @Ajit_Londhe to explain to me how new rules can be added after making the SQL to execute in paralell mode. The new files are hard to navigate. (1 Heel file now broken into 15+ files)
Sorry for the delay @Vojtech_Huser ā Iām creating a developerās Readme and trying to simplify things a bit. Iāll push this new Readme by Friday.
Agree, I will update!
RECOMMENDATION
Only allow values for negative measurements. If you have negative values for positive measurements, then set it to NULL.
ACTION
Work to develop an exception list (e.g. 1925-7-Base excess in Arterial blood by calculation, RANGE: mmol.L:[-2,+2], https://s.details.loinc.org/LOINC/1925-7.html)
Update ACHILLES to do a warning on non-exempt tests that are negative.
Work with CDM WG to document both the exception list and above text in the VALUE_AS_NUMBER column. https://github.com/OHDSI/CommonDataModel/wiki/MEASUREMENT11