Implementing custom vocabularies that have exclusions

isidoro · January 8, 2025, 12:44pm

Hi all!

I am currently implementing an ETL in python to adapt some EHR records to OMOP. I wanted to ask for advice when dealing with one of our source vocabularies, which includes custom codes that include a specific subset of ICD10CM codes. Long story short, these source codes are a relation of chronic conditions of each patient. They do not always have a direct correspondence to an OMOP standard code and sometimes going “uphill” with the relationships ends up in concepts that include the original ICD10CM codes and additional unrelated conditions.

My main question is what is the preferred approach, if any, to deal with these kind of cases. I’ve been searching the forums but cannot find something related, please point me to anything I could have missed.

Our current approach is to create a custom vocabulary entry and a custom concept_id that identifies each source code, using the 2billion reserved integers. At this point we have added one row to the VOCABULARY table and more than several rows to the CONCEPT table with our new concepts. Cool. Then we add a bunch of pairs “Subsumes”/“Is a” relationships from each source code to their respectives ICD10CM codes (Each source code subsumes several ICD10CM codes and each ICD10CM codes is a source code, please correct me if I’m wrong). The final mapping to the standard concepts is done through the ICD10CM codes, which are already mapped (Thank you!). Does this seem correct?

Also, since these source codes do not always have direct mappings to any standard concepts, can we make them standard in our local CDM instances? Does it matter outside network studies?

As an example, we have a code for aneurysms, and arterial dissections that includes ICD10CM codes: I71, I72, I77.7, I77.8 and I79.0. After some SQL/python fiddling with CONCEPT and CONCEPT_RELATIONSHIP tables I have found that these codes could all be referenced to the more general Disorder of Artery, OMOP-321887, but this also includes unrelated disorders like abscesses, ulcers, and embolisms. This is not a good solution since we would be including disorders that were not considered in the first place.

I am familiar with the idea of just putting several codes in the CONDITION_OCCURRENCE table, like when ICD10 codes reference several conditions at the same time and you just use two SNOMED concepts with the respective standard concepts. But I do not think that can be applied here since the idea of these codes is to have a concept for a family of chronic conditions, without being too specific.

Thank you all in advance!

MPhilofsky · January 8, 2025, 5:17pm

Do you also have the exact condition code for the record? Or just the high level grouper?

isidoro · January 9, 2025, 6:56am

Usually no, for most conditions we only have the high level grouper.

We would only have the exact condition code if the research target condition is actually contained in it. But in that specific situation these codes for chronic diseases are not really that helpful because we already know the actual condition.

These codes purpose is to provide some history about each patient without actually giving the full history, which we usually cannot have due to data protection.

MPhilofsky · January 10, 2025, 3:07pm

Ah, I see. I’m going to tag @aostropolets @zhuk @Alexdavv as this is a question about how to best match the semantic meaning of the source data to a standard concept_id.

Alexdavv · January 11, 2025, 10:27am

If your data only include generalized high-level groupers like “Disorder of artery”, how could you be certain there were no embolism? Because you know how that system was built and you don’t find such children in there. This is the taxonomy vs ontology problem.

It may seem that ICD can help you best because it’s also a taxonomy. But it can’t because the composition of underlying concepts is different and, therefore, you try to avoid the hierarchical approach in favour of concept sets when you pick only those ICDs that match.

Try to change a paradigm and deal with OHDSI ontology’s hierarchy. “Disorder of artery” means exactly it and could mean all the underlying conditions but we don’t know which exactly. It could be a large list of conditions one could categorize under the grouper (in this case, the SNOMED-CT).

Otherwise, only concept sets can help you because you can’t seamlessly combine 2 conflicting systems.

isidoro · January 14, 2025, 7:26am

Thank you for your answer @Alexdavv! As you said, right now we are essentially using concepts sets, if I’m not understanding it wrong. These are our first steps with OMOP CDM so we are experimenting a bit seeing how to make it work for us.

It does not only include the generalized high-level groupers. They are provided as a way of completing the patient history outside of the research target condition.

We are not the database owners, but users. We have to ask for the information relevant to our target condition and due to data protection they can only provide us with specific information relevant to said target. The chronic conditions table is a compromise to provide some additional information.

That is an option we have also considered. The point that make us doubt is that the db owners do know the exact condition, and have assigned the chronic code, so it feels like blurring the information if we just go uphill in the hierarchy and add extra conditions that were not there in the first place.

However, I get this should be the proper way if we were to use our data in a network study where standard concepts have to be used.

aostropolets · January 14, 2025, 8:40pm

So it seems what you know is that it’s either dissection or aneurysm. In that case, I’d put both SNOMED dissection and anerysm in concept_occurrence standard concept id and whatever source identifier you have in source concept id. In this way, the information will be preserved the closest. A patient would look like they have both conditions and your source group also implicitly assumes both conditions since you can’t distingush them.

Operating with Vocabularies standard codes makes your life more predictable. You never know if you will participate in a network study or if you will use the artifacts created within OHDSI for your research. And in my experience that tend to happen when everybody forgot about those tweaks they made in the Vocabs