OHDSI Home | Forums | Wiki | Github

From an "a priori" concept mapping to an "a posteriori"

Hello OHDSI community

I am interested in the work you have done, and I am confident this goes in a very good way thanks to the community.
As a member, here are some comment after the read of the 5.2 documentation found here (https://github.com/OHDSI/CommonDataModel/blob/master/OMOP_CDM_v5_2.pdf)
As an implementer, I have found some details that would help my work.

Thanks for your answers or questions or explaination.

[comments]

  • page5 “Source Values and Source Concepts are optional, while Standard Concepts are mandatory”
    • what about doing the opposite: “source concepts are mandatory, while standard concepts are optional” because:
      • in case of some error in the mapping it won’t be detected when source concepts are missing.
      • mapping should be an iterative process; data-scientist working on the data could build mapping and precise it by filling the NULL concept_ids
      • this would ease the process of providing data into OMOP format, because mapping all the concept before looks illusory
      • it would be called that an “a posteriori mapping process” in place of an “a priori mapping process”
      • then the user would fill themself the mapping in the concept_relationship
  • p5 : “If a Standard Concept does not exist or cannot be identified, the Concept with the concept_id 0 is used, representing a non-existing or unmappable concept.”
    • we should then distinguish two cases. i) concept does not exist -> 0 ; ii) concept is not yet identified -> NULL
  • p6 : “It is also possible for one source_concept_id to map to multiple standard concept_ids within the same domain. For example, ICD-9 for ‘viral hepatitis with hepatic coma’ maps to SNOMED ‘viral hepatitis’ and a different concept for ‘hepatic coma’ in which case multiple condition_occurrence records will be generated for the one source value record”
    • multiple rows for one fact. This is a problem
    • I would suggest to keep only one row in the fact table, and collect the many mapping with the existing closure table (CONCEPT_RELATIONSHIP) containing the mapping.
    • this means concept_id fields should be only filled when relationship_id is “Maps to” (since multi/semantical/contextual mapping exists the “concept_id simplification”)
    • for example: SELECT count(*) FROM condition_occurence a LEFT JOIN concept_relationship b ON (a.source_concept_id = b.concept_id_1) LEFT JOIN concept c ON (b.concept_id_2 = c.concept_id) WHERE b.relationship_id IN (‘Maps To’, ‘Concept same_as to’, ‘Concept alt_to to’, ‘Map includes child’, ‘Is a’) AND c.concept_name LIKE ‘%viral hepatitisis%’;
    • I have not seen any mention of a contextual mapping. FHIR does provide a context for its concept mapping as a free text explanation. I guess this is useful, for example in some case the mapping could be used, and in other context, the mapping would’nt be done. This flexibility is highly needed in order people use correctly the data, and be careful when writing their SQL queries.
  • p10 : SOURCE_TO_CONCEPT_MAP is deprecated; this should be taken in consideration p6 for the mapping procedure algorithm.
    • “When processing data where the source value is either free text or a reference to a coding scheme that is not contained within the Standardized Vocabularies: • Map all source values directly to standard concept_ids. Store these mappings in the SOURCE_TO_CONCEPT_MAP table.”
    • BTW the “Map all source values directly to standard concept_ids” is unclear to me. Does this mean _source_concept_id would be NULL and _concept_id would be 0 in most cases ?
  • p20 : “relationships are directional, and this field represents the source concept designation.”
    • why do the idea of source & target (in the deprecated source_to_concept_map table) has been replaced by concept_id_1 and concept_id_2 if the relationships are directionnal ?
    • this lead confusion in what is the source and the target of this relationship I guess

[minor]

  • p7 : “concept of ‘Gender’, for which there are only two allowable standard concepts of practical use (8507- ‘Male’, 8532- ‘Female’)”
  • p10 : “All Concepts[in concept_table] may be Source Concepts; they represent how the entity was coded in the source.”
    • this means standards concepts are not mandatory
t