I am working on the lab test values mapping (MEASUREMENT table).
My local mapping table (handmade) has my measurement name (in French) and the associated LOINC code.
I load my local concepts into the CONCEPT table, then use an SQL query to associate the equivalent LOINC concept_id (from concept_code mapping).
I realise that the link is not made on the LOINC concept_code.
Indeed, when I filter the CONCEPT table on a LOINC concept_code (ex 34714-6) I find no result.
This is indeed odd. The only reason I can think of is that the dash (’-’) is actually a different character in your filter query. For instance an ‘em dash’ — or ‒ (all dashes, different character encoding).
A way to test this is with the following filter: concept_code like '34714__', getting all concept_codes starting with 34714 followed by any two characters.
I have tested concept_code like '34714__' which returns the expected line.
I don’t know if this is due to the encoding because when I copy/paste the resulting concept_code (filtering on concept_id = ‘3032080’) into my query concept_code in ('34714-6') I get the same problem.
This way we can check what symbol exacty is being used. Since both concept_code and concept_id searches work fine for me, the dash (‘-’) code should be 45. But we can check all the other symbols too:)
2. Try looking up the issue on stackoverflow. It would be great if you tell if you find something there!
Thank you for this answer.
I get the same result on your query from concept where concept_id = 3032080 and I checked the whitespaces before and after the concept_code.
Indeed a very mysterious issue. Another thought: it could have something to do with the use of IN. Could you try the following two variations and see if the results are the same? concept_code = '34714-6' concept_code LIKE '34714-6'
Oh ! Thank you very much for this answer, select * from omop.concept where concept_code LIKE '%34714-6%'
works and returns the desired line. Do you know what is the explanation for this problem?
Thanks @mkwong! That sheds some more light on this mystery. There might be some hidden whitespace before or after the string in the datatabase. A few things you can try: concept_code LIKE '%34714-6' → I expect 0 results concept_code LIKE '34714-6%' → I expect the one result select LEN(concept_code) from omop.concept where concept_code LIKE '%34714-6%' → I expect it says length is 8 (where ‘34714-6’ is only 7) ASCII(substr(concept_code,8,1)) → I expect the code for a whitespace character, e.g. tab (9) or newline (12)
Here is what I get: select * from omop.concept where concept_code LIKE '%34714-6' → the expected result (1 line) select * from omop.concept where concept_code LIKE '34714-6%' → the expected result (1 line) select LENGTH(concept_code) from omop.concept where concept_code LIKE '%34714-6%' → returns length = 7 (as ‘34714-6’) … select ASCII(substr(concept_code,8,1)) from omop.concept where concept_id = '3032080' → returns 0
…
When writing ETLs - I typically assume string data might contain both white spaces and special characters. Since I write my ETLs in Java, I use the String.trim() and ingestion function to remove all characters that don’t fall with in the ASCII 32 - 126 values. It eliminates a lot mysteries. There are similar functions in python and SQL (I think).
Hmm, that was not what I was expecting. To summarise findings so far: concept_code LIKE '34714-6' → no result concept_code LIKE '%34714-6%' → 1 result concept_code LIKE '%34714-6' → 1 result concept_code LIKE '34714-6%' → 1 result
So there is no match on exactly ‘34714-6’, but using a wildcard on any side gives a match. Mysterious.
I am afraid that I am out of ideas. If you ever find out what the issue is, please let us know!
I would guess you have leading and trailing special characters in your concept_code field. When displaying the query results - field values - these characters may not be visible. Meaning the tool you are using (SQL client) filters these out so it looks like “34714-6”, but if you use Java or some other programming language to run the query and print out the byte array values (String concept_code.getBytes() in Java) you might find you have an array that is greater than 7 and the leading cells have values less than 32 and trailing cells also less than 32. You will need to clean up your text fields - which applies to all text fields.