I am looking for an enhancement to Usagi that enables the user to choose whether Usagi looks at two columns or one column when matching to concept ids. Currently, Usagi just looks at the source code to do the matching piece to concept ids. However, there are times when I need it to look at two columns. In this case, these are observation data that involve question/answer value pairs. Examples of these are here:
Most of these had to be mapped manually because Usagi only took the source_code into consideration where both the source_code and source term are needed to make a match. I am asking for the option (through a user prompt) to have ‘additional information’ column included in the matching algorithm in additional to the source code when needed. So the source term you see in the examples above would go in the additional information column for matching purposes. However, it would also need to be in the source term column so it gets exported with the current stcm file and can be loaded into source_to_concept_map table for linking. So the ETL join would be observation_source_value=source_code and value_as_string=source term.
I am not in favor of concatenating the source code and source term fields into one string eg ‘ALLERGEN_DESC CODEINE’ , ‘ALLERGEN_DESC PENICILLIN G’ , ‘SOCIAL ADL: SLEEP CONCERN NO’ in the source code for the following reasons:
readability. I and users have found these long strings hard to read especially the longer the allergen name and family history measures like ‘Family History - Parkinson’s Decease’
better ETL performance eliminating need to join on string columns that are too long
) more efficient querying without having to use like statements to find a particular measure. Currently, it is much cleaner and efficient when you can search for observation_source_value=’ 'ALLERGEN_DESC’ and value_as_string=’CODEINE’ because we have numerous history metrics we are putting in observation.
Plus it will cause more manual work to get this format with me having to separate into source_code and source_name for it to read cleaner after the mapping is done, create that new file, and then create another source to concept map. Concatenating it would be like trying to parse note text in queries.
Usagi has been designed for mapping formal terminologies; i.e. a long list of codes and descriptions like ICD10. Your example contains variable/value pairs, something Usagi handles less well. You might need to map variables and values separately. (see also my last note below)
There are currently two options:
As you say, the way to make Usagi use all the information, is to concatenate everything into one field and make Usagi use that as the Source Term. You can add all the separate pieces as ‘additional information’ columns and use these to join on.
Make separate files per codelist. One for the variables (your ‘Source code’) and one for each of the sets of values (your ‘Source terms’). e.g. a codelist for each of ‘SMOKING_STATUS’, ‘ALLERGEN_DESC’, ‘SOCIAL. ADL’. Assign a unique source codes to each of your terms to identify them. Then you can more effectively use the import filters in Usagi to specify what class of targets to use.
Two general notes:
Usagi only looks at the Source Term when matching target concepts. The Source Code is meant to be the unique identifier of the source term. In many cases this is a meaningless alphanumeric id (e.g I10 or 32547).
The source_to_concept_map table is designed to only use the unique source_code (and source_vocabulary_id as the foreign key to join on. We have a separate discussion in cases where you have variable/value pairs to map: Wide MAPPING table (in vocabulary) (problems with relationship)
I am interested what people think of the use case proposed here and how Usagi should handle this. @clairblacketer@Christian_Reich
Thank you for your response. In regards to the wide mapping table, we rarely map the value_as_concept_id field (only for covid requirements) and are unlikely to due to lack of resource availability and other mapping items taking precedence e.g. observation_concept_id, procedure_concept_id, etc. So this would not provide much benefit to us.
However, if the source_to_concept_map table could be updated to include the additional information column and Usagi could export the stcm file in that format that would be the most efficient for us because we do not have to redo all the mappings to fit a new format as suggested in option #2 which is a time consuming process given all the metrics we have.
This way I could set the source term field = additional information column after the stcm is exported so I have both of my join columns enhancing the efficiency of the process instead of having to come up with multiple excel formulas to extract the source term piece that may not work in all mapping cases. Make sense?