Help : mapping lab test results with LOINC vocabulary

MathildeFruchart · November 24, 2022, 10:12am

I am working on the lab test values mapping (MEASUREMENT table).
My local mapping table (handmade) has my measurement name (in French) and the associated LOINC code.

I load my local concepts into the CONCEPT table, then use an SQL query to associate the equivalent LOINC concept_id (from concept_code mapping).
I realise that the link is not made on the LOINC concept_code.

Indeed, when I filter the CONCEPT table on a LOINC concept_code (ex 34714-6) I find no result.

When I filter on the corresponding concept_id (3032080) I find the result with the desired concept_code.

The same process works on drugs (concept_code from ATC).

Can you tell me where this error comes from?

Thank you for your help.

MaximMoinat · November 24, 2022, 10:41am

This is indeed odd. The only reason I can think of is that the dash (’-’) is actually a different character in your filter query. For instance an ‘em dash’ — or ‒ (all dashes, different character encoding).
A way to test this is with the following filter: concept_code like '34714__', getting all concept_codes starting with 34714 followed by any two characters.

Eduard_Korchmar · November 24, 2022, 10:41am

Could it be that there are similar symbols in UTF-8 sharing similar “-” shape?

MathildeFruchart · November 24, 2022, 10:54am

Thank you for your reply.

I have tested concept_code like '34714__' which returns the expected line.
I don’t know if this is due to the encoding because when I copy/paste the resulting concept_code (filtering on concept_id = ‘3032080’) into my query concept_code in ('34714-6') I get the same problem.

However other LOINC codes work:

I don’t understand…

TatianaSkugarevskaya · November 24, 2022, 12:30pm

This seems like a very weird issue, I can offer you try these things:

Make a query like
Select ASCII(substr(‘the problematic concept code’,1,1))
,ASCII(substr(‘the problematic concept code’,2,1))
,ASCII(substr(‘the problematic concept code’,3,1))
,ASCII(substr(‘the problematic concept code’,4,1))
,ASCII(substr(‘the problematic concept code’,5,1))
,ASCII(substr(‘the problematic concept code’,6,1))
,ASCII(substr(‘the problematic concept code’,7,1))

This way we can check what symbol exacty is being used. Since both concept_code and concept_id searches work fine for me, the dash (‘-’) code should be 45. But we can check all the other symbols too:)
2. Try looking up the issue on stackoverflow. It would be great if you tell if you find something there!

MathildeFruchart · November 24, 2022, 12:58pm

Thank you for this answer, I get exactly the same result on your request (dash code in 45 too):
Capture d’écran du 2022-11-24 13-56-53

I’ll try to find a solution on stackoverflow.
Thank you

katy-sadowski · November 24, 2022, 1:48pm

Maybe the problem is in the value in the table instead? You could try running:

select
ASCII(substr(concept_code,1,1)),
ASCII(substr(concept_code,2,1)),
…
from concept
where concept_id = 3032080

and see if the ASCII codes match for all the characters.

Another thought is maybe there’s whitespace after the concept code for that row.

MathildeFruchart · November 24, 2022, 2:19pm

Thank you for this answer.
I get the same result on your query from concept where concept_id = 3032080 and I checked the whitespaces before and after the concept_code.

MaximMoinat · November 25, 2022, 8:18pm

Indeed a very mysterious issue. Another thought: it could have something to do with the use of IN. Could you try the following two variations and see if the results are the same?
concept_code = '34714-6'
concept_code LIKE '34714-6'

MathildeFruchart · November 28, 2022, 9:51am

Thank you for your answer.
I have tested both variations (LIKE and =) but I have the same result : 0 line.

I will try to make a new export from Athena and reload the LOINC vocabulary in the CONCEPT table. I will post an update message.

mkwong · November 28, 2022, 4:09pm

I think you need to include wild cards in the LIKE. Try …concept_code LIKE ‘%34714-6%’…

or

…concept_code LIKE ‘%34714-%’…

Otherwise the two statements are the same.

MathildeFruchart · November 29, 2022, 8:22am

Oh ! Thank you very much for this answer,
select * from omop.concept where concept_code LIKE '%34714-6%'
works and returns the desired line. Do you know what is the explanation for this problem?

MaximMoinat · November 29, 2022, 10:48am

Thanks @mkwong! That sheds some more light on this mystery. There might be some hidden whitespace before or after the string in the datatabase. A few things you can try:
concept_code LIKE '%34714-6' → I expect 0 results
concept_code LIKE '34714-6%' → I expect the one result
select LEN(concept_code) from omop.concept where concept_code LIKE '%34714-6%' → I expect it says length is 8 (where ‘34714-6’ is only 7)
ASCII(substr(concept_code,8,1)) → I expect the code for a whitespace character, e.g. tab (9) or newline (12)

MathildeFruchart · November 29, 2022, 11:15am

Thank you for your reply. It is very odd.

Here is what I get:
select * from omop.concept where concept_code LIKE '%34714-6' → the expected result (1 line)
select * from omop.concept where concept_code LIKE '34714-6%' → the expected result (1 line)
select LENGTH(concept_code) from omop.concept where concept_code LIKE '%34714-6%' → returns length = 7 (as ‘34714-6’) …
select ASCII(substr(concept_code,8,1)) from omop.concept where concept_id = '3032080' → returns 0
…

mkwong · November 29, 2022, 2:59pm

When writing ETLs - I typically assume string data might contain both white spaces and special characters. Since I write my ETLs in Java, I use the String.trim() and ingestion function to remove all characters that don’t fall with in the ASCII 32 - 126 values. It eliminates a lot mysteries. There are similar functions in python and SQL (I think).

MaximMoinat · December 1, 2022, 8:18am

Hmm, that was not what I was expecting. To summarise findings so far:
concept_code LIKE '34714-6' → no result
concept_code LIKE '%34714-6%' → 1 result
concept_code LIKE '%34714-6' → 1 result
concept_code LIKE '34714-6%' → 1 result
So there is no match on exactly ‘34714-6’, but using a wildcard on any side gives a match. Mysterious.

I am afraid that I am out of ideas. If you ever find out what the issue is, please let us know!

mkwong · December 1, 2022, 4:19pm

I would guess you have leading and trailing special characters in your concept_code field. When displaying the query results - field values - these characters may not be visible. Meaning the tool you are using (SQL client) filters these out so it looks like “34714-6”, but if you use Java or some other programming language to run the query and print out the byte array values (String concept_code.getBytes() in Java) you might find you have an array that is greater than 7 and the leading cells have values less than 32 and trailing cells also less than 32. You will need to clean up your text fields - which applies to all text fields.

zhuk · January 4, 2023, 10:19am

Hello, @MathildeFruchart

I believe, your question could have won the ‘Mystery of the year 2022’ award.
Any news?