I am mapping a dataset which does not have the labs in nice LOINC codes. So I have only string descriptions for the lab components. One of the highly prevalent labs is: “LYMPHOCYTE” which if I map to the closest OHDSI concept with domain_id of ‘Measurement’ I get 40772531 which has a concept_class_id of LOINC Hierarchy and a standard_concept value of C. When I try to map it to a standard concept using the ‘Is a’ relationship I get 4 possible options back: 3026710, 3012323, 3033622, 40763884 which are all lymphocyte-related labs but very specific. This poses an issue, because the lab is found 4 million times on the dataset. IF I map it to all 4 possible concepts, I get an extra 12 million rows in the measurements tables. I can’t really make a call on which one of them in particular is the lab referring to. So what is the best practice to handle these types of mappings, as I have around 50 labs that resolve to LOINC Hierarchy codes like the one I showed.
Question two: I have some LOINC codes that are deprecated and I can’t seem to be able to find which one replaced them. I have tried the relationships ‘LOINC replaced by’, ‘LOINC replaces’ and ‘Concept replaced by’ with all returning no matches. Some sample codes are: 3035064 and 3013039. The first one is quite important for me to map, as the dataset has over 3.5 Million instances of that measurement.
Hi @juan_banda, these are good questions, and ones for which we don’t have
one simple answer, but I’m happy to share my personal opinion as a
recommendation:
That sucks, we’ve been through the NLP exercise of mapping freetext to
LOINC, and its definitely imperfect, largely because freetext tends to be
unspecific. It’s ok to map to a LOINC hierarchy concept, and we’ve done
that ourselves, which is ok when you can’t be any more specific. I
wouldn’t recommend mapping to all the descendant concepts under the
hierarchy concept, that’ll cause a combinatoric explosion that’ll get ugly
fast.
Yes, your pain is fully felt. Actually, we are thinking of doing something specifically about the LOINC codes, because they are so overly detailed and the hierarchy is not that helpful, either. Unless you run a clinical lab, I guess. Got a few ideas, but it’s far from ready.
Here is some help for your concrete problems:
By “LYMPHOCYTE” they probably mean lymphocyte in blod. You therefore search for (or pick from a generic keyword search) the Loinc Hierarchy Concept 40787267 “Lymphocytes | Bld-Ser-Plas”. To get the descendants, don’t use the “Is a” relationship. Firstly, that would give you only one step down the hierarchy, secondly, there are other hierarchical relationships than “Is a”. Instead, always use the CONCEPT_ANCESTOR table. The descendants are:
3003215 Lymphocytes [#/volume] in Blood by Manual count
3004327 Lymphocytes [#/volume] in Blood by Automated count
3019198 Lymphocytes [#/volume] in Blood
43055371 Lymphocytes/Leukocytes [Pure number fraction] in Blood by Automated count
43055366 Lymphocytes/Leukocytes [Pure number fraction] in Blood by Manual count
3037511 Lymphocytes/100 leukocytes in Blood by Automated count
3038058 Lymphocytes/100 leukocytes in Blood by Manual count
3002030 Lymphocytes %
The first ones are the absolute amount of lymphos, the second one as a fraction of leukos (lymphos are a form of leukocytes). Both tests are used in the clinic. You can pick any one of the first three to represent the absolute, and any one of the last 5 to stand for the fraction, since you don’t know and don’t care how it was counted.
In order to distinguish which of the two categories your lab test actually is, you have no other way than to look at the values. If they are roughly between 1.3 and 3.5 10^9/L you got the absolute, if they are 28-55 (or 0.28-0.55) you got the relative measurement in %. My guess is it will be the relative, because it is more common. But hey, I never practiced in the US, who knows.
Sorry, I know it sucks. Welcome in the precise science of clinical medicine.
If you want, toss me the 50 and I map them. Won’t take that much time. As in true Mafia style - you will owe me, you just don’t know when, yet.
The LOINC deprecations are unfortunately incompatible with the way we do it in the Standard Vocabs. LOINC can deprecate a code to two or more replacement codes. So, if they figured a code is ambiguous, they replace it with two more precise ones. That makes sense as long as there is a human looking at this. We cannot have that, because then you’d get two mappings to the replacements, and the ETL would write 2 records into the CDM. Not good. As a result, we kick out all replacements that go to more than one code. Nothing we can do about that.
@Patrick_Ryan. Thanks for the suggestions! I didn’t know I could have the C concepts in the measurements table, but if we can, then this will simplify my life . In terms of the suggestions for the deprecated codes, these will work just fine! thanks!.
@Christian_Reich. Thanks for clarifying the ‘Is a’ relationship and its reach, I will be sure to use the ancestor table for this particular task. In terms of marrying to a more specific LOINC code, while I agree this would be the nicest way of doing things, it will greatly add complexity to the mapping process. Of the 4 million ‘Lymphocyte’ labs, about half match to the relative criteria, around 25% to the absolute, but the other 1 million are a mixture of who knows what (some of the units are messed up, etc.). So I would still have to do more detective work on 25% of the data, and this gets repeated times 50 lab codes. As Patrick mentions, if I can leave them as C hierarchy concepts, this would be great as I don’t have to infer anything from the data and just leave it for the researcher to make a call on which ones to use
Thanks for the offer of mapping the 50 or so concept_id’s, leaving them as C concepts (if allowed on the measurements table) is my preferred way of dealing with them. I rather save the favors for more critical tasks
Good thing to know about the LOINC deprecation and how the vocabulary handles it. One thing to note is that in the version of the vocabulary I have (Mar 2016), the ‘LOINC replaced by’, ‘LOINC replaces’ are completely empty.
One last question @Patrick_Ryan, @Christian_Reich, We also have some labs mapped to concepts from the domain_id=‘Measurement’ but from CPT and SNOMED vocabulary_id’s. Is this ok to have? they are non-deprecated concepts and most of them are standard. Should I just re-map the non-standard codes? Thanks!
Actually, you cannot. You can’t have C Concepts in the data tables. I don’t think Patrick suggested that. And you shouldn’t: Kicking the can down the road to the researcher and leave it for him to figure out is exactly the kind of thing we don’t want to do, and why we have the CDM in the first place. Plus: At the other end is most often not a researcher but a automated tool, and that one will clearly not figure it out.
And all Concepts you are mapping to should be Standard. That’s what it means to be Standard – it represents the semantic meaning of a thing unambiguously with a single Concept. If you feel like we are missing a Standard Concept and you can only find non-standard ones let me know and we fix it.
Wrt to the lymphocytes: Consider ignoring the unit and only going by the number. If the number is below 100 it has to be percent of leukocytes. It’s not compatible with life to have 100 lymphos even per µl.
As always, I defer to @Christian_Reich on all things vocabulary. It seems
unstructured lab results will be a challenge for lots of folks, but there’d
be tremendous value in working together as a community to reach some
consensus conventions we can live with. @Juan_Banda, I’d take @Christian_reich up on his offer to have the vocabulary team help you with
mapping the terms, so you don’t have to reinvent the wheel here. And
apparently I need to go back to my ETLs and remove any place where we’re
mapping to LOINC classification concepts…
We don’t care about the scale and particularly the method. So, we want to create higher level concepts where these are successively stripped out. Kind of like drug products, drug forms and ingredients. That way, you can have single codes for the kind of htings you need.
This sounds awesome! the Hierarchy concepts seem to provide something like this already, but making them S concepts probably introduces all kinds of other issues I assume. This sounds good, for know I made some data-driven decisions to map lab to the closest possible ones I can find (based on values and units).