OHDSI Home | Forums | Wiki | Github

Vocabulary navigation proposal

Continuing the discussion from Vocabulary hierarchy exploration via Atlas:

Following is my attempt to quote the relevant pieces to start the new thread:

Most of the complexity we’ve been dealing with, I believe, is due to cross-vocabulary relationships. There are very complex individual vocabularies, like SNOMED, but on their own and from the local perspective of a single concept, I believe they are not that difficult to navigate.

I’m sure your point #3 is right, @Christian_Reich, that it’s possible to design better navigation tools if we allow customization to particular contexts. But the context wouldn’t need to be, e.g., a vocabulary-independent ingredient. We can start from Metformin in a specific vocabulary like RxNorm. Then if we want to navigate to classes we can wander over to ATC, if we want indications we can go to that vocabulary, etc. With the layout I’ve proposed, even if a user didn’t have much familiarity with the different vocabularies, they would get a sense of what was available, at least in other vocabularies with concepts directly linked to the concept under inspection. For some vocabularies, like SNOMED, it would probably be ideal to customize navigation depending on the concept under inspection, but for most, a consistent navigation UI would probably suffice, and, I suspect, a single navigation UI would probably do ok across all the vocabularies if we confine ourselves to insular (intra-vocabulary) hierarchical relationships and direct inter-vocabulary relationships.

To address your list of problems:

  • Medically meaningful relationships (like ‘Anatomical site of’) vs navigational/hierarchical relationships (‘Is a’, ‘Equivalent of’)

With both of these types of relationships, again, I believe that most of the complexity is removed and most of the meaning is retained with intra-vocabulary navigation and hopping across vocabularies to go from source to standard, standard to source, or to follow relationship types not available in the vocabulary under inspection.

  • One related concept/parent/child vs a few relateds/parents/children vs many many relateds/parents/children

Again, I suspect most of the link explosion is cross-vocabulary.

  • Non-standard vs. standard concepts, where we want to discourage the use of non-standard ones (they are not compatible with the CDMs that are ETLed from other coding schemes, we really need to wean people off those ICD9s).

With shading and record counts I think we get the best of both worlds: people navigate where they want, but they clearly see when they are in neighborhoods not attached to patient records and how to move towards neighborhoods that are.

  1. If we want a single standard navigator of things, it proboably needs to have two views simultaneously:
    • An overview topological view where we are (like a dot on the US map), and

This would be really great. If anyone has funding to work on it, I hope you’ll think of me :smile:

    • A local view (the streets around my house in Cambridge).

That’s what my proposal addresses, right?

  1. We need to have “flexible” design elements depending on the size of the topological neighborhood:

Totally agree.

So, please let me know if I’m misunderstanding the challenges, or if I need to be clearer about what I’m proposing. Thanks!

1 Like

One question on this. It seems to me that OHDSI maintains hierarchical and semantic relationships for standard terms and for a few select other vocabularies to augment the standard terms’ hierarchies, but it does not maintain relationships for non-standard terms. Are we asking for OHDSI to maintain hierarchies for all 43 vocabularies? Not sure if we have the resources for that. ICD9-CM is a little easier because it is implied by the code, but the others require real maintenance.

George

Oh. Huh. That may make my proposal implausible. I’m new to OHDSI vocabulary stuff and don’t know how it’s put together. I am talking about records in the relationship table, not in concept_ancestor, but even with those, you’re saying there’s a lot of OHDSI work that goes into producing intra-vocabulary relationship records? I had assumed that intra-vocabulary relationships came from the source vocabularies or UMLS. Is that totally wrong?

Friends:

Lots of ideas here.

Yes, ATC is clear. But relationships between drugs and indications (which look like Conditions) are not, or within LOINC from surveys and their answers, etc. The naming convention of the relationship help, but only if you know what’s behind and read the (unfinished) documentation. But yes, it’s better than nothing, I agree.

That would work in some, but not all. SNOMED has a rich inner-vocabulary system of relationships. But we are actually trying to get away from vocabularies as the main principle of organization, and favor domain-based relationships instead. Take RxNorm. That is a closed system, but only works in the US. In other countries, you’d have to use a RxNorm/RxNorm Extension combination. Drugs and their classifications are equally cross-vocabulary.

Not sure. First, use domains, instead of vocabularies. But even those: The relationships between Indications (Drug domain) and SNOMED (Condition) is crossing all boundaries, and there are use cases.

That may be true. Because inner-vocabulary realtionships are mostly manual, and therefore not excessive. Let me find out.

It does.

There’s a lot about OHDSI vocabularies I don’t understand yet, so it’s hard for me to know if any given reservation or question about my proposal is because 1) I haven’t explained myself clearly enough, 2) I’ve misunderstood something and need to tweak the proposal, or 3) I’ve misunderstood something big and some basic aspect of the proposal is impossible or infeasible.

I’ve just had an offline conversation with @Christian_Reich and it’s still not clear to me whether I’m not getting the real complexities and need for the current structures or he’s not getting how my proposal could cut through a lot of those.

First, regarding @hripcsa’s concern: @Christian_Reich confirmed that most of the intra-vocabulary relationship table records are derived directly from the vocabularies, so it might be possible to include intra-vocabulary relationships from non-standard vocabularies without a huge burden.

So, the big and most dubious implication of my proposal is basically that the concept_ancestor table is not used in the navigation UI and becomes unnecessary. @Christian_Reich says we can’t leave users to navigate the (single-step) relationship table relationships because they’ll never be able to find their way around; the concept_ancestor table is necessary in order to navigate the real, cross-vocabulary relationships that are meaningful to users; only people intimately familiar with the vocabularies would be able to make sense of a navigation topology consisting only of direct parent-child or sibling relationships.

(TL;DR – you can skip reading the indented section here if time is short; the more important points are below.)

He brought up some particularly troublesome cases, like (I think) VA Product to ATC Class, which would require three lateral jumps, making it basically an unnavigable path for a non-informaticist user.

We also talked about the path from SPL to RxNorm Drug Product where the downward path through ingredient can lead to many Drug Products that are not actually related to the SPL, so the two-step link supplied in concept_ancestor is necessary.

My contentions, I realize, will be a hard sell to people who have much more experience with all this than I do and who have solved many of the inter-vocabulary navigation problems using the inferred relationships generated for the concept_ancestor table – but I will lay them out.

I don’t know if this captures the issue @Christian_Reich brought up with VA Product, but I’ll use the example of VA Product EMPAGLIFLOZIN 10MG TAB: http://www.ohdsi.org/web/atlas/#/concept/45777555. My first objection to the current setup with this example is that there is no information on the Hierarchy tab because this is not a standard concept. The results on the Related Concepts tab, though, strike me as particularly weird and not all that useful:

The first item, ORAL HYPOGLYCEMIC AGENTS,ORAL is a VA Class, clearly an ancestor of some sort, but there’s no indication here of what the path is between the drug we’re looking at and this class.

All of the other items have 0 for Record Count and Descendant Record Count. Why is that? Is this a drug that doesn’t appear in the data? Why are there no ATC classes on this list? Why are these particular concepts showing up on the Related Concepts list and not a bunch of others that could be related in one way or another?

Since the only path available to me from here that seems like it would lead to actual records in the data is going up to ORAL HYPOGLYCEMIC AGENTS,ORAL, but that ends up confusing me more. Now I see the first page of 13.605 entries. Whatever the 4,456 DRC was referring to seems to have nothing to do with this higher-level page… Ok, I’m going to stop with the blow-by-blow as I stumble around trying to find a path from a VA Product to the most relevant ATC class and to the most relevant standard concepts actually tied to records in the database.

What I’d like to see is:

  • Any actual VA drug hierarchy and how its levels are tied to the concept in question. (This might not be currently possible, though, because VA Product and VA Class are not just different classes, they’re different vocabularies–which makes me think (especially after searching around the web and finding this) that the VA stuff in OMOP is not from a coherent vocabulary but was maybe cobbled together from references in FDB or something.)
  • Where this concept ties to concepts in other vocabularies and the local neighborhoods–insular to each vocabulary–of those concepts.

I’ve started doing some thinking about @Christian_Reich’s proposal for “an overview topological view where we are (like a dot on the US map)”, and think it can maybe be done in a good way–the VA concept in question and the other-vocabulary concepts it links to could all be highlighted on this topological map so the user would have a better sense of what links are worth following in order to get to a desired neighborhood.

Anyway, I know I have a lot more convincing to do before anyone will believe me (and I don’t know yet if I’m right), but my conjecture is that the inferred relationships in the concept_ancestor table make it harder, not easier, to build a clear, intuitive vocabulary navigation UI and that the effort to discourage use of poor vocabularies by only including their relationships to standard concepts sacrifices information important to users. If a user is looking at concepts in some bad vocabulary (e.g., ICD9), I suspect:

  1. They have source data using that vocabulary.
  2. That vocabulary may be bad, but it probably has some internal logic.
  3. The user may have a better understanding of that vocabulary than of standard vocabularies.

To be clearer about the implications of what I’m proposing, I think it would involve adopting rules like:

  1. Represent the topology of each source vocabulary as accurately and completely as possible in the relationship table.

  2. Mapping from source concepts to standard concepts as much as possible should:
    a. use external resources like UMLS or FDB
    b. add or maintain OHDSI-originated mappings only where necessary
    c. map only to synonymous or sibling concepts except when cross-granularity or cross-semantic mappings are the only way to capture important information

  3. (Probably) don’t include links from one non-standard concept to another in a different vocabulary unless that’s the only way to establish a path from that concept to a standard concept.

In the case of SPL to RxNorm Ingredient and SPL to RxNorm Drug Product, both should be captured in the relationship table. An SPL represents both a set of ingredients and a set of products; but it doesn’t represent all products that contain that set of ingredients. So, from the point of view of RxNorm, product and ingredients can have a child-parent relationship, but RxNorm product is not a grandchild of SPL.

I’m sure that’s more than enough for now. Sorry to go on for so long.

@Sigfried_Gold

Keep going. This is a Forum for debate. :smile:

But why do we want to do that? What’s the use case?

That’s how we get it from the source.

Because the source doesn’t provide it. And we don’t construct those either, because VA Product Codes are not used as Standard. RxNorm Codes are. And that is why you have no records. VA Class, on the other hand, is a Classification Concept, and hence has descendants - in RxNorm.

The VA “stuff” is collected from the RxNorm distributable. They tie whatever they tie: VA Product, VA Class, the various NDFRT classes, SNOMED and of course RxNorm. There is no one way of how the connection works, and the constructor of the CONCEPT_ANCESTOR table walks through all potential chains. VA Product is not a Standard Concept, so it can be a link in the chain, but never the end of the chain.

Why is that so? The CONCEPT_ANCESTOR table is the way to make it easier for the users who don’t want to navigate that maze. However, nothing speaks against creating a UI to go down the original relationships. I just don’t see the point, unless you are providing that for the non-hierarchical semantic relationships.

These are the rules in practice today. There are “Is a” relationships in ICD9CM and ICD10CM.

There aren’t relationships from SPL to Ingredients. We have to infer them.

That’s exactly as it stands today. The relationships are direct between SPL and the products (NDCs, RxNorms). But in the CONCEPT_ANCESTOR table we have a stratification, and we make the (inferred) ingredients children of the SPL (min_level_of_separation=1), and the products the greatgrandchildren (min_level_of_separation=3). In between are Drug Forms and Drug Components. How is that a problem?

Bottom line, @Sigfried_Gold: The relationships aren’t the word of the Lord passed down at the mountain. They are idiosyncratic, and they come and go dependent on how the source organizes the data. Some of them are manually curated. In order to make a high-quality hierarchy availabe for querying the data we create the CONCEPT_ANCESTOR table. That makes the hierarchical relationships pretty redundant, but we still keep them. The non-hierarchical relationships, though, have medical content supporting other use cases (e.g. give me the anatomical site of a Condition or so). The UI should do a better job supporting those, I believe.

Let me know.

1 Like

Ok. I think I need to try to make a proof of concept for what I’m talking about and see how well it works. I’m going to quit trying to make a case for it until I have something to show. I’ve got a lead or two for possible funding so I can put in the time. If anyone else would like to pitch in, let me know.

I would support the goal below.

The non-hierarchical relationships, though, have medical content supporting other use cases (e.g. give me the anatomical site of a Condition or so)

My focus is on procedures. In that domain - for example, ICD10PCS is in CDMV but no internal relationships. However, we are schizophrenic about internal relationship. The quote above implies that we want them but on the other hand we see them as high maintenance burden.

I think we should adopt a rule that for some vocabularies included in CDMV (e.g., SNOMED) - we deeply care about internal relationships. And in others - we don’t (e.g., ‘Currency’)

For example if I want to retrieve all procedures that involve radiation - it is currently hard to do that in Atlas.

(Another problem is grouping of SNOMED relationships.)

1 Like
t