OHDSI Home | Forums | Wiki | Github

Constraint Issue

Hi,
can someone help me with some issues with V5 vocabulary.

  1. the LOINC concept_id 45877004 and 45885233 have extra tabs within concept name.
  2. Unable to create foreign key constraints on vocabulary tables due to orphaned records in listed tables.
    concept_ancestor
    concept_class
    concept_relationship
    relationship
    drug_strength

Thanks,
Hira

  1. Tabs? do you mean “\t”?

    no tabs here

  2. can you give an example of such orphaned concepts
    and how do you actually create these constraints?

this is an example of some of the Loincs I have found in the concept_name of concept_id = 45877004. the length of concept_name is 5312.

  1. These are some of the concept_ids found in concept_relationship table. However, i am not able to find these in concept table.
    concept_id = (1131990
    ,40145707
    ,40162888
    ,42800984
    ,40164962)
    I am unable to create constraints for concept_id due to this issue.

Thanks,
Hira Sherazi

yeah, it explains your issues,
there is a problem with field delimiter you use.
they’re seems to Tab-delimited

No, there is an extra quote in the name, and that kills the quoted strings in the csv file. Timur has a Jira task. Somebody needs to ping him.

Hi ,

I am working on a Oncology study and mapping it to CDM version 5.

We are trying to deal with partial dates like (Drug Exposure/Condition/Measurement):

Only Year is given - 2015
Month and Year is given - May/2015

What is the standard process to deal with situation like this ? Should we consider these records having partial dated if yes how to represent it?

all inputs are welcome!

Hi Nitish, all…

Have you considered approaching it from a semantic modeling perspective, using a GraphDB construct to express it, and expose it for apps consumption as a microservice?

This approach may be particularly valuable for designing and optimizing rapid algorithmic learning loops in the AI / ML / DL space.

The hypothesis here is driving a strong separation of concerns. In so doing. apps consuming this microservice would have the utmost flexibility of addressing NULLs in a manner we may not be aware of yet.

By ‘NULLs’ I mean a temporal service where there may be no value for any of the date attributes – be it a second, minute, hour, date, day, or year. And for a globally-mindful solution, time zone would also play a role in that model.

Further, should there be a confidence level of less than 100% concerning the value of any one or more of the temporal model’s nodes, your Temporal Microservice would provide the capability to expose it as well.

Thoughts?..

Hi Ron,

Thanks for your response!

We are not using semantic modeling perspective for this purpose.

Let me put my question more clearly we are doing ETL for that study.

In some raw data for example Condition data the start date of condition has been captured as a partial date e.g. Jan/2015 (day is missing) , 2016 (day and month both are missing)

So my first question is should we include this record and If Yes then should we impute the date to make it complete ? is there any date imputation method followed in CDM.

I hope I have explained it well!

Looking forward for your response

@TBanokina, @IYabbarova
Have you meet such an issue when the date was encoded only as a “Month+Year” or even “Year” only?

@nitishkjha, as you said

so if I would working both on ETL and Study I would figure out what is the dependence between these “only-year-date” concepts and your study purposes. For example if you need only fact of presence of these entities, you may just put them as a middle of known interval, like 15th of Month given or 15.06. of year given.
On the other hand if there is insignificant data, you may just ignore it.
So, I think, you need to talk to your analytics team to decide - what’s better for the data quality - to round up or to remove

@Dymshyts @nitishkjha

Usually we have the full date including year/month/day or don’t have the date at all. In the last case we exclude such data.

But if a significant part of your data are represented like this than I agree with Dmytry and will choose a default month and day to include the data in the study.

thanks tatiana !

Thanks Nitish. We might be running into a larger problem here then…

Because I do not have visibility to this Oncology app I can’t opine on whether the team should or should not include a partial date record.

Nevertheless, as ‘trusting the data’ is a foundational UX design goal, it may be valuable to consider what would be the implication of including the clinical data associated with the partial data record vs. not letting the user know that there is a clinical record at all.

So it depends on the use case and whether the app is intended to be a highly bounded, one-off, w/o downstream interoperability role – or not.

Point is, Separation-of-Concern at design time coupled with semantic modeling of the data infrastructure implemented as a GraphDB may be the optimal choice for both short term/well-defined usage and long-term/not yet specified goals.

OHDSI had to deal with this ETL issue for JMDC data (see here). Per the specs, they used the 15th of the month as the day “because accidental reversal of temporality (where the order of events is switched because one piece of information did have a date, and another didn’t) is just as likely to occur in one direction as the other”. So this is one example you can use.

t