OHDSI Home | Forums | Wiki | Github

Harmonizing measurement numeric values and units

We are getting a national registry with lab measurement values.
It gathers lab measurements from different laboratories.
Often, each laboratory uses a different code for the same measurement value.
We have mapped most of these different laboratory values to the equivalent OMOP-standard value (typically LONCI)

However, local lab values mapped to the same LONCI often have different units.

According to the specifications
there is not a standard unit to set for a given LONCI code.

We are thinking to approach this as follow:

measurement_concept_id = the OMOP-standard/LONCI concept_id
measurement_source_concept_id = the 2billionare concept_id for the local lab code

unit_concept_id = standard concept_id for the most common unit in all the events with local codes mapped to the measurement_concept_id
unit_source_concept_id = concept_id of the unit of the local lab code
unit_source_value = verbatin of the unit for the local lab code

value_source_value = verbatin of the value
value_as_number = if unit_concept_id = unit_concept_id.
if unit_concept_id != unit_concept_id, and they are compatible units, value_source_value * conversion factor
if unit_concept_id != unit_concept_id, and units not compatible, then NULL

As an example :

Following local codes:
concept_name = P -Kreatiinikinaasi; concept_id = 2014001064
concept_name = S -Kreatiinikinaasi ; concept_id = 2017000671
concept_name = P-Krea ; concept_id = 2017000678

all map to concept_id = 3020564 Creatinine [Moles/volume] in Serum or Plasma

The most common unit is µmol/l, but theare ar other ones such as mol/l, µ/l, ml/min/1.73m^2, titre, …

A measurement event with local code P -Kreatiinikinaasi value 0.000114 [mol/l] will end up as

measurement_concept_id = 3020564
measurement_source_concept_id = 2014001064

unit_concept_id = 8749
unit_source_concept_id = 9586
unit_source_value = mol/l

value_source_value = 0.000114
value_as_number = 114 (0.000114 mol/l *10^6)

Is this the correct way ??
this way values in source in different units will be converted to the same unit and are therefore compatible when doing a calculation or aggregation

1 Like


I’m on the Laboratory LOINC Committee.

When you indicate each performing laboratory uses a different code. Do you mean LOINC code or the unique identifier for the test on each performing laboratory’s menu?

When you indicate local lab values have different units, are these on the same scale (e.g. powers of 10, 100 difference) like mg/dL and g/dL or different units like mg/dL and mmol/L? The same LOINC could be used for the first exampe, but not the second. Each performing lab determines how they report out lab result values, whether their units, reference range, etc. in accord with their accreditation/regulatory requirements in each country. Changing units without understanding how the values would be impacted (do they need to be adjusted by a power or 10, 100, etc.?) could introduce significant biases/data quality issues.

For example, I’d expect this CK to be mapped to LOINC 2157-6 Creatine kinase [Enzymatic activity/volume] in Serum or Plasma (Looks like Kreatiinikinaasi is CK in Finnish, not Creatinine, so may be a translation/mapping error. Looks like P-Krea is Plasma Creatinine in Finnish.) These are two quite different lab results. May wish to confirm the original data are codified correctly.

The other test result with units of ml/min/1.73^2 is estimated glomeular filtration rate or eGFR. It is a calculated result that leverages creatinine and usually patient age and sex values and used for kidney function. Again, it is a different test result (procedure, measurement, observation are other terms people use for this too) than Creatinine and thus has different units. There are also different eGFR formulas for calculating, and thus different LOINC codes for the calculations. It’s important to know which is used as patient values may be clinically significantly different depending on the formula.

See also this older thread (and linked pages) Standardizing Units for Measurement - #11 by Vojtech_Huser

If a given measurement (e.g. weight) is mixture of 2 units (lb and kg) - since 2017 the community suggested a solution to have it ideal for analysis. (at ETL time harmonize to a single unit).

AND have all users of OMOP target THE SAME target unit such that multi site analyses work well.

AMIA 2018 poster https://www.researchgate.net/publication/378737424_Real_World_Database_for_Validation_of_Units_for_Clinical_Laboratory_Tests

2017 poster https://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=resources:huser-2017-ohdsi-symp-units.pdf

thank so much for these comments.

I see two separated problems being solved here.

  1. At the mapping level.
    thanks @apitkus , yes, I believe in this example (and perhaps others) we are having some mapping problems.
    The mapping was made solely looking at the local lab name, and not the name+unit.
    It may be, like in this example, that the same lab code uses two units that there are not interchangeable. In this case, it is likely the they are 2 separated lab tests. We will review our mappings considering also the units.

  2. At the ETL level.
    Thanks @Vojtech_Huser , In the case that the mapping is correct, and a lab value has units that are interchangeable (typically, adjusted by powers of 10). Then we should adjust the values to the recommended unit during the ETL. If we do this wrong DQD will tell us.

However, I couldn’t find a working link to that list of recommended units per lab value in any of the abstracts, threats or sub threats.

Could you post here a link to that table of recommended units ??, thanks in advance

@Vojtech_Huser is this the table ?

If so, this is not easy to be found.

On the shared threat , was suggested that the recommended unit will be added to concept_relationship and this will be added to the CDM documentation

Has this happened, and I’m not finding it, or this was now implemented ?


This folder has the latest results https://github.com/OHDSI/StudyProtocolSandbox/tree/master/themis/extras/results2019

file that ends with …ABC.csv ( S7-preferred_units-ABC.csv) has the largest number of rows.

I think my analysis considers some classes A, B and C and applies different logic in each class. I think class A has strong or universal agreement among the considered sites. And the file ABC merges those 3 classes.

With extra wisdom we all have since 2019 - it would be nice to look at N3C and AllOfUs and EHDEN and Sentinel and pedsnet and others how they approach it as of 2024.

The issue of ‘Should we aim to standardize 7 tests, 70 tests or 700 tests ?’ remains the same.
Back in 2018 when I argued for rules in DQD (to @clairblacketer), I was pushing for the scope of 7 tests. Which is easy to achieve consensus of a group. And gets us started on where/how to publish the preferred units. (e.g., relationship in a vocabulary layer of OMOP or just a warning severity of problem inside DQD)