OHDSI Home | Forums | Wiki | Github

Wide MAPPING table (in vocabulary) (problems with relationship)

I like the idea of source concept triggering an event and a visit (and the appropriate visit type). The error column is also interesting. (representing uncertainty)

The idea with representing a range through average and error rather than lower and upper value is to make it easier for folks to calculate summary statistics. They can just use the number in value_as_number, like they would do if the amount is precise. But we don’t have such a field yet in MEASUREMENT and OBSERVATION.

Any progress on the new stcm table?

I drafted a PR for Usag in the meantime to be able to handle mapping of variable/value combinations to event, value and unit concepts. Exporting this to a regular stcm table will create losses. So a new definition of stcm will be hellpful.

(please mind, this PR is draft, still being tested and refined)

1 Like

Thanks @Christian_Reich again for going through the mapping table in our previous UKB working group. As promised, I made an example export in this new format from our Usagi UKB field mappings. See attached file.

I think this makes for a really straight forward mappings for these types of vocabularies.

@Alexdavv Does this match the examples you have made?

Examples UKB Mapping Table Proposal .xlsx (6.2 KB)

Hi @MaximMoinat ,

I also drafted a couple of examples, look here.

The structure is the same and I have a couple of suggestions regarding the mapping:

It’s still not clear how to handle this numeric stuff (\d+.?\d*).
In general, it looks very good, but we will stick with concatenated question-answer pairs for some time.

1 Like

Thanks @Alexdavv, also for taking us through this during Friday’s UKB working group.

For handling numeric values, I propose to introduce a ‘value_type’ column that describes where the (mapped) value would end up. The possible types would be:

  • value_as_number
  • value_as_concept_id
  • value_as_datetime
  • value_as_string
  • pre-coordinated (variable+value only map to e.g. an observation_concept_id)

To increase standardisation, we can use the concept_id of the respective fields (e.g. 1147172) instead of strings.

1 Like

Is it just for numeric values or the entire structure of the MAPPING table?

If you have 2 elements in the source data (numeric result + its interpretation), you’d probably want to preserve both in one CDM record using value_as_number + value_as_concept_id. Here is the discussion. There are also many cases when value_as_string is used as an additional piece of information stored.

Also @MaximMoinat pointed out that we don’t have a source_concept_id field. To make the links performing, we need to add it (as we do in the concept_relationship). Source_to_concept_map approach (source_code/vocabulary_id combination) doesn’t seem to be an option since it’s not unique for some vocabularies.

This is also an open question. As far as I get it, the MAPPING table should be a guide for ETL, providing machine-readable instruction on how and where to extract the numeric value from. As well as differentiate the cases when there is no need to extract them (NULL numeric field).

@Christian_Reich should we continue pushing the idea of “Wide mapping table”?
What about making it a topic of one of the Comminity calls?

Excellent idea! This should be presented to the community.

*Edit - If it’s a partially baked idea, present it to the CDM/Vocab WG

Can you draft a proposal we can put into the Github issue list, @Dymshyts? I’ll help.

@clairblacketer and @Christian_Reich,

Where does the wide mapping table issue fall on the list of priorities for the CDM/Vocab WG?

6.1? We have now several use cases that are waiting: Surveys (UKB in particular), oncology.

Shouldn’t this discussion be in the CDM Builders forum, @Christian_Reich ? Uncategorized seems like an attic or flavor of null

I look forward to the presentation on this topic. after perusing the different forum postings and linked excel documents, I’m not quite following the logic.

Both. We need a new table. That’s CDM. To fill it - Vocabularies.

Let me add some thought here:

  1. A wide mapping table should serve both OMOP vocabularies and custom project-related mappings.
    That’s why there are 2 options for linkage:
  • in addition to source_concept_id add source_vocabulary_id / source_code combination, but they’re not unique for some vocabularies and 2 approaches at once doesn’t seem consistent.
  • handle custom mappings using 2B+ source concepts and forget source_to_concept_map table, what creates some difficulties in implementation.
  1. Text string, being a type of the source_code_description and information that sometimes lands on the value_as_string field, is probably required to be added. But wouldn’t it be better to have the source_code_description by itself? Seems no, since it’s a duplication of the concept_name from the concept table.
    But once we introduce the source_string field, the custom mappings are not being processed using the 2B+ concepts. This conflicts with item 1.

  2. Unit of measure. May be reflected in the source in different ways:

  • being a part of the question or answer. It works well since we have target_unit field.
  • being a separate entity coming from another field. Isn’t the concept of the wide mapping table is to provide ETL with a comprehensive way of mapping (without using any additional custom vocabularies and logic, i.e. for unit)? But if we add the source_unit field, it gets us to a сombinatorial explosion for most of the real-world data sources, even thought it might be useful (affecting the target concept) for clean vocabularies/sources.

BTW, the concept of the wide mapping table will be presented tomorrow March 19 at 10 am Eastern Time during the EHR WG call.

1 Like

Is there a recording of this meeting or was a table standard agreed upon? If so, will the ddl be released to create soon?

Not anymore. MSTeams stores it for 2 weeks only. But you can find very detailed notes in the EHR WG team.

I found your reply while looking for a clue regarding “Maps to value” for which we do not fully understand the purpose. Can you briefly clarify? Thank you.

It’s a relationship if you need to split up a precoordinated code into a variable concept and a (postcoordinated) value concept. E.g.: take a code “Positive Covid-19 test”. This would have to be split into “Covid-19 Test”, which is a measurement concept, and “Positive”, which is the resulting value concept. The former is linked through “Maps to”, and the latter through “Maps to value”.

It’s all in the Book of OHDSI.

The poster on the topic we end up with at the OHDSI Symposium: link.

1 Like
t