in UK BB meeting, we are discussing complex vocabulary issues.
The slide is
This is to start the discussion about
- groups of relationship
- relationship table has limitations
- what a MAPPING table would look like (@Christian_Reich
OHDSI Home | Forums | Wiki | Github |
in UK BB meeting, we are discussing complex vocabulary issues.
The slide is
This is to start the discussion about
To kick this of, a simple example for numeric fields of UKB.
field_id | name | mapping_approach | date_field_id | target_concept_id | unit_concept_id | conversion_factor |
---|---|---|---|---|---|---|
46 | Hand grip strength (left) | numeric | 53 | 44805437 | 9529 | 1 |
48 | Waist circumference | numeric | 53 | 4172830 | 8582 | 1 |
50 | Standing height | numeric | 53 | 3015514 | 8582 | 1 |
93 | Systolic blood pressure, manual reading | numeric | 53 | 4152194 | 8876 | 1 |
For more examples see: https://github.com/thehyve/ukbiobank-omop-etl/tree/master/vocabulary
I don’t follow.
Considering line 1
what 46 and 53 refer to? In first and fourth column.
Friends:
Here is a proposed solution. The current situation has two types of mappings: “Maps to” and “Maps to value”. These work well, but a number of use cases are not covered:
Instead, we would introduce a new MAPPING table, which covers all those use cases. Split mappings to more than one entity target concepts two or more rows would be used, similarly to multiple “Maps to” records in the current system:
Looks like the table gets cut off on the left side. Here is the same thing in Excel: Mapping table proposal V1.xlsx (11.8 KB)
I like the idea of source concept triggering an event and a visit (and the appropriate visit type). The error column is also interesting. (representing uncertainty)
The idea with representing a range through average and error rather than lower and upper value is to make it easier for folks to calculate summary statistics. They can just use the number in value_as_number, like they would do if the amount is precise. But we don’t have such a field yet in MEASUREMENT and OBSERVATION.
Any progress on the new stcm table?
I drafted a PR for Usag in the meantime to be able to handle mapping of variable/value combinations to event, value and unit concepts. Exporting this to a regular stcm table will create losses. So a new definition of stcm will be hellpful.
(please mind, this PR is draft, still being tested and refined)
Thanks @Christian_Reich again for going through the mapping table in our previous UKB working group. As promised, I made an example export in this new format from our Usagi UKB field mappings. See attached file.
I think this makes for a really straight forward mappings for these types of vocabularies.
@Alexdavv Does this match the examples you have made?
Hi @MaximMoinat ,
I also drafted a couple of examples, look here.
The structure is the same and I have a couple of suggestions regarding the mapping:
I added the mappings to Standard Question-Anwer combinations, where some of the targets are UKB concepts by themselves (no Standard equivalent exist).
Qualifier Value concepts (No evidence of, History of) are not intended to be used for event coding - they’re just attributes. How about these: Clinical finding absent, History of clinical finding in subject, Past history of procedure and their new descendants with time context.
I would not map to the event such things as Urea correction reason. Or at least don’t specify the reason in the operator concept_id field - it’s off-label usage
For Alcohol intake frequency, some may be mapped to the pre-coordinated events: Alcohol consumption unknown, Occasional drinker, Current drinker, etc.
We need to be very specific on the presence of the historical context in the question - we review both name and note fields provided by the source.
It’s still not clear how to handle this numeric stuff (\d+.?\d*).
In general, it looks very good, but we will stick with concatenated question-answer pairs for some time.
Thanks @Alexdavv, also for taking us through this during Friday’s UKB working group.
For handling numeric values, I propose to introduce a ‘value_type’ column that describes where the (mapped) value would end up. The possible types would be:
To increase standardisation, we can use the concept_id of the respective fields (e.g. 1147172) instead of strings.
Is it just for numeric values or the entire structure of the MAPPING table?
If you have 2 elements in the source data (numeric result + its interpretation), you’d probably want to preserve both in one CDM record using value_as_number + value_as_concept_id. Here is the discussion. There are also many cases when value_as_string is used as an additional piece of information stored.
Also @MaximMoinat pointed out that we don’t have a source_concept_id field. To make the links performing, we need to add it (as we do in the concept_relationship). Source_to_concept_map approach (source_code/vocabulary_id combination) doesn’t seem to be an option since it’s not unique for some vocabularies.
This is also an open question. As far as I get it, the MAPPING table should be a guide for ETL, providing machine-readable instruction on how and where to extract the numeric value from. As well as differentiate the cases when there is no need to extract them (NULL numeric field).
@Christian_Reich should we continue pushing the idea of “Wide mapping table”?
What about making it a topic of one of the Comminity calls?
Excellent idea! This should be presented to the community.
*Edit - If it’s a partially baked idea, present it to the CDM/Vocab WG
@clairblacketer and @Christian_Reich,
Where does the wide mapping table issue fall on the list of priorities for the CDM/Vocab WG?
6.1? We have now several use cases that are waiting: Surveys (UKB in particular), oncology.
Shouldn’t this discussion be in the CDM Builders forum, @Christian_Reich ? Uncategorized seems like an attic or flavor of null
I look forward to the presentation on this topic. after perusing the different forum postings and linked excel documents, I’m not quite following the logic.
Both. We need a new table. That’s CDM. To fill it - Vocabularies.