OHDSI Home | Forums | Wiki | Github

Wide MAPPING table (in vocabulary) (problems with relationship)

in UK BB meeting, we are discussing complex vocabulary issues.
The slide is

This is to start the discussion about

  • groups of relationship
  • relationship table has limitations
  • what a MAPPING table would look like (@Christian_Reich

@Christian_Reich, could you please provide a link where the new MAPPING table is described?

1 Like

To kick this of, a simple example for numeric fields of UKB.

field_id name mapping_approach date_field_id target_concept_id unit_concept_id conversion_factor
46 Hand grip strength (left) numeric 53 44805437 9529 1
48 Waist circumference numeric 53 4172830 8582 1
50 Standing height numeric 53 3015514 8582 1
93 Systolic blood pressure, manual reading numeric 53 4152194 8876 1

For more examples see: https://github.com/thehyve/ukbiobank-omop-etl/tree/master/vocabulary

I don’t follow.

Considering line 1
what 46 and 53 refer to? In first and fourth column.


Here is a proposed solution. The current situation has two types of mappings: “Maps to” and “Maps to value”. These work well, but a number of use cases are not covered:

  • Mapping to numeric
  • Mapping to numeric and unit
  • One to many mapping with both “Maps to” and “Maps to value”, because it is ambiguous which “Maps to” belongs to which “Maps to value”
  • Mapping to a range
  • Many to one (pre-coordination) mapping, for example for question-answer and variable-value pairs

Instead, we would introduce a new MAPPING table, which covers all those use cases. Split mappings to more than one entity target concepts two or more rows would be used, similarly to multiple “Maps to” records in the current system:

Source Target
Source Concept Question/ Variable Answer/ Value Range Standard Concept Numeric/ Quantity Operator Error Unit Value Concept Condition Status Visit
Ambulatory procedures - lithotripsy       Lithotripsy             Ambulatory Surgical Center
  CS Tumor Size \d+ 001 - 988 millimeters (mm) |(Code exact size in mm) Breast Cancer - Tumor Size \d     millimeter      
Documentation of patients with primary headache diagnosis and imaging other than ct or mri obtained                   Primary diagnosis  
Evidence of alcohol involvement determined by blood alcohol level of 100-119 mg/100 ml       Ethanol [Presence] in Blood 110   10 milligram per deciliter      
Home visit, phototherapy services (e.g., bili-lite), including equipment rental, nursing services, blood draw, supplies, and other services, per diem       Home visit, phototherapy services (e.g., bili-lite), including equipment rental, nursing services, blood draw, supplies, and other services, per diem             Home Visit
  Do you suffer from diabetes [PhenX] Yes   Diabetes mellitus              
Injection, methylnaltrexone, 0.1 mg       methylnaltrexone Injection 0.1     milligram      
  When you took Vitamin C, how much did you usually take [PhenX] 250 mg or less   ascorbic acid 250     milligram      
  When you took Vitamin C, how much did you usually take [PhenX] 300 to 500 mg   ascorbic acid 400   100 milligram      
Cervical high risk human papillomavirus (HPV) DNA test positive       Human papilloma virus DNA | Cervix         Positive    

Looks like the table gets cut off on the left side. Here is the same thing in Excel: Mapping table proposal V1.xlsx (11.8 KB)


I like the idea of source concept triggering an event and a visit (and the appropriate visit type). The error column is also interesting. (representing uncertainty)

The idea with representing a range through average and error rather than lower and upper value is to make it easier for folks to calculate summary statistics. They can just use the number in value_as_number, like they would do if the amount is precise. But we don’t have such a field yet in MEASUREMENT and OBSERVATION.

Any progress on the new stcm table?

I drafted a PR for Usag in the meantime to be able to handle mapping of variable/value combinations to event, value and unit concepts. Exporting this to a regular stcm table will create losses. So a new definition of stcm will be hellpful.

(please mind, this PR is draft, still being tested and refined)

1 Like

Thanks @Christian_Reich again for going through the mapping table in our previous UKB working group. As promised, I made an example export in this new format from our Usagi UKB field mappings. See attached file.

I think this makes for a really straight forward mappings for these types of vocabularies.

@Alexdavv Does this match the examples you have made?

Examples UKB Mapping Table Proposal .xlsx (6.2 KB)

Hi @MaximMoinat ,

I also drafted a couple of examples, look here.

The structure is the same and I have a couple of suggestions regarding the mapping:

It’s still not clear how to handle this numeric stuff (\d+.?\d*).
In general, it looks very good, but we will stick with concatenated question-answer pairs for some time.

1 Like

Thanks @Alexdavv, also for taking us through this during Friday’s UKB working group.

For handling numeric values, I propose to introduce a ‘value_type’ column that describes where the (mapped) value would end up. The possible types would be:

  • value_as_number
  • value_as_concept_id
  • value_as_datetime
  • value_as_string
  • pre-coordinated (variable+value only map to e.g. an observation_concept_id)

To increase standardisation, we can use the concept_id of the respective fields (e.g. 1147172) instead of strings.

1 Like

Is it just for numeric values or the entire structure of the MAPPING table?

If you have 2 elements in the source data (numeric result + its interpretation), you’d probably want to preserve both in one CDM record using value_as_number + value_as_concept_id. Here is the discussion. There are also many cases when value_as_string is used as an additional piece of information stored.

Also @MaximMoinat pointed out that we don’t have a source_concept_id field. To make the links performing, we need to add it (as we do in the concept_relationship). Source_to_concept_map approach (source_code/vocabulary_id combination) doesn’t seem to be an option since it’s not unique for some vocabularies.

This is also an open question. As far as I get it, the MAPPING table should be a guide for ETL, providing machine-readable instruction on how and where to extract the numeric value from. As well as differentiate the cases when there is no need to extract them (NULL numeric field).

@Christian_Reich should we continue pushing the idea of “Wide mapping table”?
What about making it a topic of one of the Comminity calls?

Excellent idea! This should be presented to the community.

*Edit - If it’s a partially baked idea, present it to the CDM/Vocab WG

Can you draft a proposal we can put into the Github issue list, @Dymshyts? I’ll help.

@clairblacketer and @Christian_Reich,

Where does the wide mapping table issue fall on the list of priorities for the CDM/Vocab WG?

6.1? We have now several use cases that are waiting: Surveys (UKB in particular), oncology.

Shouldn’t this discussion be in the CDM Builders forum, @Christian_Reich ? Uncategorized seems like an attic or flavor of null

I look forward to the presentation on this topic. after perusing the different forum postings and linked excel documents, I’m not quite following the logic.

Both. We need a new table. That’s CDM. To fill it - Vocabularies.