OHDSI Home | Forums | Wiki | Github

Metadata extension to CDM


(Vojtech Huser) #1

I would like to announce a proposal to extend metadata in the CDM.
The proposal is described at the CDM workgroup website

http://www.ohdsi.org/web/wiki/doku.php?id=documentation:next_cdm:metadata

The proposal is motivated by our efforts to look at data quality of various tables. And a free text description of several domains and the type of dataset (general population vs. something special) would be a good addition to the CDM.

Overview of all other proposals is also available here (the proposal above is listed as ‘metadata’)
http://www.ohdsi.org/web/wiki/doku.php?id=documentation:next_cdm


(Erica Voss) #2

@Ajit_Londhe and I would like to participate in this discussion. I think the second description listed on the Wiki is outdated from what we have implemented on our side. @Ajit_Londhe even developed a ACHILLE report exposing the domain notes from this data we are testing out.

However even since implementing this beta idea on our side @Ajit_Londhe and I have had other ideas about generalizing the table further for storage of other Metadata about the CDM (e.g. such as CDM run times).

@Ajit_Londhe and I will meet up next week to do a better job of documenting our ideas. We are open to other thoughts and input.


(Christian Reich) #3

@ericaVoss:

Please add the link to this Forum to hte Wiki page proposal, so people can find it.

Also, can you invite Dino? He has a lot of ideas and needs for metadata.


(Dino Gambone) #4

@ericaVoss yes, I’d be interested in participating as well.


(Erica Voss) #5

@Christian_Reich - done, @Vojtech_Huser beat me to it. :smile:


(Taha Abdul-Basser) #6

Interested as well.


(Gregory Klebanov) #7

please sign me up for this as well. Thanks


(Erica Voss) #8

@Vojtech_Huser, @dgambone, @t_abdul_basser, & @gregk,

@Ajit_Londhe and I drafted our ideas in the Wiki. We wanted to pass them by Vojtech first since he has been thinking about this the longest. Then I figure we could send out the notes to all of you and discuss via email or set up a meeting.

Long way of saying . . . we are still thinking about this but getting some thoughts together first - I think we’ll have a more productive discussion with something tangible to provide feedback on.


(Vojtech Huser) #9

I like the new update. For others puzzled by what concept_id there are to use in the METADATA table, do advanced search in Atlas (under vocabulary) like this (pick OMOP Domain)


(Erica Voss) #10

Finally got my act together here is the MetaData Doodle for when we could meet to discuss:
http://doodle.com/poll/hs9p2rkmqixdhp96

@Vojtech_Huser, @Ajit_Londhe, @dgambone, @t_abdul_basser, & @gregk please let me know which date/times work for you.


(Erica Voss) #11

Looks like 10/26 @ 3:00PM.

@gregk can you PM me your email so I can send the meeting invite.


(Erica Voss) #12

Does anyone have @gregk’s email?

Hope to see everyone at 3PM EST tomorrow! If you don’t have the meeting invite on your calendar let me know!


(Daniella Meeker) #13

Timely. Just back from metadata conference where one “next-step” subject was alignment of post-11179 metadata standards. Also, we did some metadata modeling for ONC pilot - not super proud if it, but a start.

I’m on fumes as bandwidth goes, but perhaps we can get someone from Columbia and Eric+Josh to participate in metadata discussion here. I opened the topic last week w/ PMI.


(Erica Voss) #14

Daniella, email me who you want me to invite and I will.


(Vojtech Huser) #15

@ericaVoss, can you please add to the proposal two more examples for what values could go into the column METADATA_TYPE_CONCEPT_ID.

Also, if a concept exist (e.g., visit type) - do we expect one row per METADATA_CONCEPT_ID or people can still submit multiple name-value pairs. (and one of the names will equal the concept name).

If METADATA_CONCEPT_ID does not exist for a metadata entity (e.g., death table was using using 2014-June state death certificate data) - do we populate METADATA_CONCEPT_ID with concept of 0?


(Erica Voss) #16

Per today’s meeting I added all the domains to the examples. Vojtech will prepare to propose at the next CDM team meeting.


(Vojtech Huser) #17

I wasked asked in email (@schillil) about ability to attach metadata to columns. The current proposal provides a shell and “Athena terminology” provides concepts. So to comment on column, one can use concepts for that. E.g., visit type.

See the highlighted concept below. (there are 54 concepts at the moment to pick from) (but we anticipate additional concepts created per requests of “metadata documenters”). Again, we want to give a generic tool for metadata that is “extensible” as we need.


(Vojtech Huser) #18

Daniela mentioned today other metadata literature "Metadata as DDI or ISO-11179. Also, HL7 standards for metadata.

Rimma proposed to make make metadata table like observation table (value_as_string, value_as_concept_id)

Should the scope be computable metadata or metadata for humans to read about.


(Vojtech Huser) #19

I would like to continue the metadata discussion at the upcoming CDM WG call.

I created a modified table proposal that possibly addresses some of the points raised during the last discussion

The key is not to confuse metadata with data characterization as done by Achilles. (achilles_results table). An ETL or data warehouse insider knows a lot about a warehouse and the point of metadata is to put some of this “insider” knowledge into metadata - so that a user (or analyst) can get quickly “semi-intimate” with the data by just reading some some smart and organized notes made by the insider.

Perhaps we can propose a shell and let the community decide how to best use this shell to put some useful metadata content into it and in phase2 made metadata tighter and better. The perfect should not be enemy of the good in this phase 1.

Perhaps every WG member can provide examples of metadata that they would like to capture (and post here).

Mine would be:

  • dataset is updated once a year (or monthly or …)
  • dataset reflects only data from clinical trial (not routine care)
  • Achilles is executed after each data refresh. achilles_results are always available
  • dataset has drug order data as well as pharmacy dispensation data (can study ‘patient did not fill his prescription’ questions)
  • weight data comes from Health Risk Assessment done by health plan (not from EHR)
  • PHR data is present (in OBSERVATION table) but not mapped to any standard concepts and there are no plans to do this mapping
  • procedure data are in local codes only (any phenotype using procedures has to tweak the standard code in the phenotype to the local code)
  • dataset has EHR data only (and no claims data; site has no affiliated health plan)
  • dataset has claims data with “sparse lab data” (e.g., all that come from an accessible source, such as LabCorp). Such data does not reflect all lab data. Inpatient lab results are not present. (not available)

UPDATE: after CDM WG April meeting - the proposal was updated with phase 1 and phase 2 scope and use cases were updated.


(Vojtech Huser) #20

To continue the discussion, I will try to tag folks to contribute example metadata they see as important.

@Christian_Reich @ericaVoss @Sigfried_Gold @rimma
This was my original input seeking post:

Perhaps every WG member can provide examples of metadata that they would like to capture

Examples from OHSI github are:

A search for CDM_DOMAIN_META on github shows many other examples.

It seems like the table has 2 columns (“name value pairs”). Can someone post the DDL for the CDM_DOMAIN_META table (or where it can be found) @ericaVoss ?

The examples above again indicate the mode of “how to get semi-intimate” with a dataset by reading some smart metadata notes done by the dataset custodian.


t