Mapping clinical symptoms and defining a new set of vocabulary for variants

kridsadakorn · January 30, 2024, 4:08pm

Hi Everybody,

I am currently attempting to map keywords to standard concept names. As this is my first attempt, I would appreciate any suggestions from experts. I have some technical questions and would be grateful for your assistance in resolving them.

For the first point regarding clinical symptoms, I’m trying to map them to standard concept names from SNOMED as much as possible. However, not all of them can be mapped to SNOMED; some map to other vocabulary sets, such as Nebraska Lexicon, etc.

Is it good practice to have concept names from different vocabulary sets? If not, what would you suggest?
Some symptoms can be mapped to the “Condition” domain, while others can be mapped to the “Observation” domain. In practice, I need to separate them into two tables: “condition_occurrence” and “observation”. Is this correct? If so, it seems a bit odd to separate clinical symptoms into two tables. If this is not the correct approach, could you please suggest an alternative?

For the second point regarding diagnosed genetic variants, there are around 8000 concept names in ClinVar.

I would like to define new concept names. How could I register to get a new ID for a new vocabulary?
Which would be a better practice between using mixed concept names from ClinVar and self-defined vocabulary, or relying solely on self-defined vocabulary and defining relationships to ClinVar for the overlapping variants instead?

I would appreciate all comments. Thank you very much.

Christian_Reich · January 31, 2024, 10:53pm

No problem with that, but for symptoms you should find everything in SNOMED (unless it is cancer). Can you not?

Which ones are observations?

Don’t. The standard vocabulary is OMOP Genomic. If you have Clinvar, it is very easy to convert using a tool called Koios. It typically takes NGVS notation. Do you have that?

kridsadakorn · February 7, 2024, 9:30am

Thank you very much for your reply.

The data I have can be converted to ClinVar format. For example, NM_198578.4 (LRRK2):c.39G>C (p.Glu13Asp) follows this format. However, some variants might be annotated to older versions of transcripts, such as NM_198578.3. While it would be preferable to store the data in ClinVar format rather than OMOP Genomic format, ClinVar currently only has around 8000 concept names. Consequently, many variants cannot be mapped, which is why I would like to define new concept names.

For the question about symptoms, here is an example: weight loss and dyssomnia were mentioned in a medical note. Weight loss is in the domain of Observation, and dyssomnia is in the domain of Condition. Should I just separate these two terms into two tables?

Christian_Reich · February 7, 2024, 11:44pm

We use ClinVar as the source. But we use the OMOP Concept tables and machinery for defining relationships (e.g. from gene to genomic variant to transcript variant to protein variant), so that all OHDSI tools work.

Exactly. We would have a proper OMOP concept with this variant. Except right now we only cover oncology related concepts, and this is some syndromic variant. We also don’t intend to do variant discovery. Only variants with clinical relevance that you would report to a clinician are considered. But once you have these, you can treat them like co-variates in the epidemiological machinery of OHDSI, something bioinformatics folks notoriously fall short on.

What therapeutic area are you trying to get going?

Exactly. The vocabulary tells you were to put it and, more importantly, the analyst will find it. E.g. Dyssomnia is a Condition. However, things are not always simple. Weight loss for example exists both as a Measurement (if you want to record the amount of it) and as an Observation (if it is just something that was noticed). And then there is Cachexia, which is when it turned into a Condition. Fun, right?

This gets resolved through Themis, which is where we set our conventions in those situation.