Choosing Among Common Data Models for Real World Data Analyses

nigam · August 1, 2019, 5:23pm

https://ascpt.onlinelibrary.wiley.com/doi/epdf/10.1002/cpt.1577

Interesting new paper, just posted on CPT’s website.

–nigam.

abedtash_hamed · August 1, 2019, 8:34pm

Thanks @nigam for sharing. This is really an interesting read. Some of the points in the paper are fair, but I feel it is a little biased toward Aetion (where the 1st author is an advisor) particularity the comparison made in Table 1. Aetion has similar limitations, too.

Some of the statements in the paper also need more clarification, like this one: “While the OMOP CDM clearly expects the use of the standard constructs, the underlying raw data are available but they lack organization and are presented in a way that makes them difficult to use for analysis.” Assuming it is referring to the “raw data” not the “standard constructs”, they didn’t clarify or give an example of “lack of organization” and why it is difficult to use “_source_concept_id” or “_source_source_value” fields. We experienced no problem in using the source-value fields in one of our exercises, although this is not the expected standard way of doing analysis in OMOP.

Another issue around “transparency” that “since the mapping is often done by a third party and not the study team,the rationale for certain mapping decisions remains opaque” may not be very true. We internally document everything related to data mappings (both data field and concept mappings). It is really up to the scientist/study owner on how to handle the documentation process (short and quick vs. high quality) like any other observational studies. The user of the CDM can simply ask the third-party to provide such information.

But, this is a fair point that some of the concept mappings in the OMOP vocabulary may change the decision making; however, the paper is broadly referring to this as “mapping algorithms” that may cover both data field mapping and concept mapping. We have seen this in one of our previous oncology projects at Lilly that led to significantly different cohort attritions. This mainly stems from the fact that source coding systems (eg, ICDs) have different levels of granularity than reference vocabularies (eg, SNOMED), resulting in many-to-one or many-to-many non-specific mappings. In case of SNOMED, a proposal is already in place to create a new SNOMED Extension vocab (@Christian_Reich @Dymshyts) that may fix the problem.

I think it would be worthwhile to read the paper carefully and address the mentioned gaps in the next OMOP CDM’s iterations. However, the paper is referring to the studies on earlier versions of the CDM, so some of the them have been already taken care of.

Thanks again for sharing!
Hamed