Wide MAPPING table (in vocabulary) (problems with relationship)

Vojtech_Huser · December 30, 2020, 10:20am

in UK BB meeting, we are discussing complex vocabulary issues.
The slide is

This is to start the discussion about

groups of relationship
relationship table has limitations
what a MAPPING table would look like (@Christian_Reich

Alexandra_Orlova · July 24, 2020, 7:49am

@Christian_Reich, could you please provide a link where the new MAPPING table is described?

MaximMoinat · July 31, 2020, 4:43pm

To kick this of, a simple example for numeric fields of UKB.

field_id	name	mapping_approach	date_field_id	target_concept_id	unit_concept_id	conversion_factor
46	Hand grip strength (left)	numeric	53	44805437	9529	1
48	Waist circumference	numeric	53	4172830	8582	1
50	Standing height	numeric	53	3015514	8582	1
93	Systolic blood pressure, manual reading	numeric	53	4152194	8876	1

For more examples see: https://github.com/thehyve/ukbiobank-omop-etl/tree/master/vocabulary

Vojtech_Huser · August 6, 2020, 1:26am

I don’t follow.

Considering line 1
what 46 and 53 refer to? In first and fourth column.

Christian_Reich · August 29, 2020, 5:45pm

Friends:

Here is a proposed solution. The current situation has two types of mappings: “Maps to” and “Maps to value”. These work well, but a number of use cases are not covered:

Mapping to numeric
Mapping to numeric and unit
One to many mapping with both “Maps to” and “Maps to value”, because it is ambiguous which “Maps to” belongs to which “Maps to value”
Mapping to a range
Many to one (pre-coordination) mapping, for example for question-answer and variable-value pairs

Instead, we would introduce a new MAPPING table, which covers all those use cases. Split mappings to more than one entity target concepts two or more rows would be used, similarly to multiple “Maps to” records in the current system:

Source				Target
Source Concept	Question/ Variable	Answer/ Value	Range	Standard Concept	Numeric/ Quantity	Operator	Error	Unit	Value Concept	Condition Status	Visit
Ambulatory procedures - lithotripsy				Lithotripsy							Ambulatory Surgical Center
	CS Tumor Size	\d+	001 - 988 millimeters (mm) \|(Code exact size in mm)	Breast Cancer - Tumor Size	\d			millimeter
Documentation of patients with primary headache diagnosis and imaging other than ct or mri obtained										Primary diagnosis
Evidence of alcohol involvement determined by blood alcohol level of 100-119 mg/100 ml				Ethanol [Presence] in Blood	110		10	milligram per deciliter
Home visit, phototherapy services (e.g., bili-lite), including equipment rental, nursing services, blood draw, supplies, and other services, per diem				Home visit, phototherapy services (e.g., bili-lite), including equipment rental, nursing services, blood draw, supplies, and other services, per diem							Home Visit
	Do you suffer from diabetes [PhenX]	Yes		Diabetes mellitus
Injection, methylnaltrexone, 0.1 mg				methylnaltrexone Injection	0.1			milligram
	When you took Vitamin C, how much did you usually take [PhenX]	250 mg or less		ascorbic acid	250			milligram
	When you took Vitamin C, how much did you usually take [PhenX]	300 to 500 mg		ascorbic acid	400		100	milligram
Cervical high risk human papillomavirus (HPV) DNA test positive				Human papilloma virus DNA \| Cervix					Positive

Christian_Reich · August 29, 2020, 5:46pm

Looks like the table gets cut off on the left side. Here is the same thing in Excel: Mapping table proposal V1.xlsx (11.8 KB)

Vojtech_Huser · August 31, 2020, 5:34pm

I like the idea of source concept triggering an event and a visit (and the appropriate visit type). The error column is also interesting. (representing uncertainty)

Christian_Reich · August 31, 2020, 6:59pm

The idea with representing a range through average and error rather than lower and upper value is to make it easier for folks to calculate summary statistics. They can just use the number in value_as_number, like they would do if the amount is precise. But we don’t have such a field yet in MEASUREMENT and OBSERVATION.

MaximMoinat · October 16, 2020, 3:24pm

Any progress on the new stcm table?

I drafted a PR for Usag in the meantime to be able to handle mapping of variable/value combinations to event, value and unit concepts. Exporting this to a regular stcm table will create losses. So a new definition of stcm will be hellpful.

github.com/OHDSI/Usagi

Map variables

OHDSI:master ← OHDSI:map-variables

opened 05:12PM - 01 Oct 20 UTC

MaximMoinat

+1381 -760

Testing out some features to support mapping of variables (in contrast to code l…ists). This supports the mapping of wide tables where each column/value combination maps to an OMOP concept. New features: - Value and Unit Mapping Types #92 #98 - `Ignored` status if a variable should not be mapped to OMOP #93 - Author and timestamp of mapping and status change #95 - Import value and unit #96 - `Flagged` status for problematic source codes #97 Also addresses issues: - #16 - #17 - #88 (also related to closed issue #85)

(please mind, this PR is draft, still being tested and refined)

MaximMoinat · November 5, 2020, 4:58pm

Thanks @Christian_Reich again for going through the mapping table in our previous UKB working group. As promised, I made an example export in this new format from our Usagi UKB field mappings. See attached file.

I think this makes for a really straight forward mappings for these types of vocabularies.

@Alexdavv Does this match the examples you have made?

Examples UKB Mapping Table Proposal .xlsx (6.2 KB)

Alexdavv · November 6, 2020, 12:58pm

Hi @MaximMoinat ,

I also drafted a couple of examples, look here.

The structure is the same and I have a couple of suggestions regarding the mapping:

I added the mappings to Standard Question-Anwer combinations, where some of the targets are UKB concepts by themselves (no Standard equivalent exist).
Qualifier Value concepts (No evidence of, History of) are not intended to be used for event coding - they’re just attributes. How about these: Clinical finding absent, History of clinical finding in subject, Past history of procedure and their new descendants with time context.
I would not map to the event such things as Urea correction reason. Or at least don’t specify the reason in the operator concept_id field - it’s off-label usage
For Alcohol intake frequency, some may be mapped to the pre-coordinated events: Alcohol consumption unknown, Occasional drinker, Current drinker, etc.
We need to be very specific on the presence of the historical context in the question - we review both name and note fields provided by the source.

It’s still not clear how to handle this numeric stuff (\d+.?\d*).
In general, it looks very good, but we will stick with concatenated question-answer pairs for some time.

MaximMoinat · November 8, 2020, 10:53am

Thanks @Alexdavv, also for taking us through this during Friday’s UKB working group.

For handling numeric values, I propose to introduce a ‘value_type’ column that describes where the (mapped) value would end up. The possible types would be:

value_as_number
value_as_concept_id
value_as_datetime
value_as_string
pre-coordinated (variable+value only map to e.g. an observation_concept_id)

To increase standardisation, we can use the concept_id of the respective fields (e.g. 1147172) instead of strings.

Alexdavv · November 8, 2020, 7:36pm

Is it just for numeric values or the entire structure of the MAPPING table?

If you have 2 elements in the source data (numeric result + its interpretation), you’d probably want to preserve both in one CDM record using value_as_number + value_as_concept_id. Here is the discussion. There are also many cases when value_as_string is used as an additional piece of information stored.

Also @MaximMoinat pointed out that we don’t have a source_concept_id field. To make the links performing, we need to add it (as we do in the concept_relationship). Source_to_concept_map approach (source_code/vocabulary_id combination) doesn’t seem to be an option since it’s not unique for some vocabularies.

This is also an open question. As far as I get it, the MAPPING table should be a guide for ETL, providing machine-readable instruction on how and where to extract the numeric value from. As well as differentiate the cases when there is no need to extract them (NULL numeric field).

Dymshyts · January 4, 2021, 2:57pm

@Christian_Reich should we continue pushing the idea of “Wide mapping table”?
What about making it a topic of one of the Comminity calls?

MPhilofsky · January 4, 2021, 3:25pm

Excellent idea! This should be presented to the community.

*Edit - If it’s a partially baked idea, present it to the CDM/Vocab WG

Christian_Reich · January 6, 2021, 2:22pm

Can you draft a proposal we can put into the Github issue list, @Dymshyts? I’ll help.

MPhilofsky · January 6, 2021, 5:21pm

@clairblacketer and @Christian_Reich,

Where does the wide mapping table issue fall on the list of priorities for the CDM/Vocab WG?

Christian_Reich · January 6, 2021, 6:33pm

6.1? We have now several use cases that are waiting: Surveys (UKB in particular), oncology.

MPhilofsky · January 6, 2021, 8:00pm

Shouldn’t this discussion be in the CDM Builders forum, @Christian_Reich ? Uncategorized seems like an attic or flavor of null

I look forward to the presentation on this topic. after perusing the different forum postings and linked excel documents, I’m not quite following the logic.

Christian_Reich · January 7, 2021, 4:32pm

Both. We need a new table. That’s CDM. To fill it - Vocabularies.