OHDSI Home | Forums | Wiki | Github

Fact relationships: Searching for an extensible approach

Thanks @clairblacketer . I’ll work on putting something more formal, and ideally further thought out, with a little help from some constructively critical colleagues.

I’m not sure if others agree but I’m advocating that the scope of FACT_RELATIONSHIP should only include relationships that either:

  1. explicitly and discretely represented in the source data, or
  2. were created by an established derivation process (e.g. NLP, imaging, episode derivation, etc.), ideally with those records also containing context as to the derivation provenance (e.g. DOI for the algorithm) but I’ll leave the “derivation_id” argument for another thread

In other words, this would exclude any implicit relationships, even when plausible, that could be biased and/or based on fuzzy logic.

@jmethot Wow. That is both hilarious and a little scary as I had no recollection of it, at least consciously. Even funnier is that I appear to provide some opposition but I believe, with a grain of salt as even I have trouble understanding my points, I was pushing back on the idea that the table would serve as a proof of concept that would then generate domain-to-domain specific tables and not opposing the proposed FR structure itself.

But yes, @jmethot , that is nearly identical and a great find. I believe the only delta there would be the focus on the specific subset of domain-to-domain concepts intended to confine valid usage of the approach, but perhaps that was implicit in that proposal as well.

Judging by the end of that thread…

and @Christian_Reich’s infallible memory, perhaps he can enlighten us as well?

As stated above I’ll work on getting this into a more succinct proposal, perhaps with additional focus on the perceived benefits and options for what the concept sets would consist of. In the meantime if folks have any lingering feedback, the more critical the better, I’d love to hear it.

I’m neutral about @rtmill proposal to extend fact_relationship with begin/end dates. But that’s not because the extension isn’t merited, it’s because i don’t like fact_relationship to begin with. If the alternative is to add arbitrary pointers to very table, e.g. measurement_event_id, mark me as a hard yes for this proposal instead.

For relationships that are not homogeneous, I’d rather have a accepted pattern for tables over fact_relationship. Automated programs could read information_schema to infer knowledge about those relationships. Besides a core set of columns which we’d have, each relationship could then be customized to directly reflect the sort of fields relevant, with some rather standard patterns. We could have an expedited approval process for tables/columns following those established patterns and a registry to ensure that we don’t end up with duplicates where every application has its own meta schema. Modern relational databases deal with 100s or even 1000s of tables quite easily. For building abstract classes, a few dozen tables in a UNION ALL also doesn’t perform poorly, and could even perform better. It’s easy enough for SQL generation to deal with this sort of challenge so that user experiences are not burdened.

Regardless, I strongly prefer @rtmill’s approach to extend fact_relationship with begin/end dates over every table getting a untyped pointer (e.g. measurement_event_id). Having a free pointer in each table doesn’t scale for many reasons, which @rtmill has articulated. There’s no good way to add additional attributes to the relationship, such as start/end dates as proposed here. So, if we need a junk drawer approach, let’s keep it localized to fact_relationship please.

One could always use @rtmill’s approach for urgency, but a templated yet expedited approach for approving new relationships. Once reduced to practice with existing fact_relationship usage, we could have an review and approval process. Data migration could then be relatively easy. Moreover, an automated program could use CTEs to maintain compatibility among older schemas that use fact_relationships and concrete tables (which may have additional columns). Moreover ETLs used for OMOP subsetting could make these relationships concrete so that researchers could have an easier time understanding and querying the data sets they receive. Anyway, did I misunderstand something? Please let me know if this is the case.

Thanks for listening.