OHDSI Home | Forums | Wiki | Github

Adding a source value column for IDs


I am writing to ask whether it is okay to add a column, for instance, to the visit_occurence table named ‘visit occurence_id_source_value’ to save the source ID value. the visit IDs I am mapping from the source table are not integers and are a bit messy, so I created an auto incremented numeric id for visit_occurence_id. I also do not want to lose these source IDs as they might come useful when a use case requires a link back to the source dataset. I thought I could just add a column, but I figured that it’s also important to maintain the schema of the CDM because that is the purpose of the CDM. Should I add the column or do you suggest any other solution?

I would deeply appreciate your assistance. I also apologize if this question has been asked before. I searched in the forum and did not see it. Thank you and have a great day! :smile:

1 Like

@Haroun_Chahed I think you can store those ids in ‘visit_source_value’ in CDM (https://github.com/OHDSI/commondatamodel/wiki/VISIT_OCCURRENCE). Then you don’t need to add columns

Hey @SCYou! Thank you for your reply. I thought ‘visit_source_value’ is supposed to have the source value of ‘visit_concept_id’. Even if I were to map the source ID to ‘visit_source_value’, I wouldn’t be able to map the source value for ‘visit_concept_id’. Do you see my point?

I think the only table that has an ID_souce_value column is the Person table, and that column is ‘person_source_value’.

@Haroun_Chahed Sorry, I can’t understand what you’re gonna store in the ‘visit_source_value’…
I thought that the messy source of visit_occurrence_id could be stored in ‘visit_source_value’. Do you have other source for visit_occurrence_id, except what you mentioned?

@SCYou exactly I don’t see another source column for visit_occurence_id and that’s why I am asking whether it’s okay to add a column for it. As far as I understand, the ‘visit_source_value’ is supposed to hold the source value of ‘visit_concept_id’. Please correct me if I am wrong as I am new to the CDM.

Are you referring for a way to link back at record level to the source data? I.e. you want to point to the key in the source data . This seems important from an audit point of view, but not from the analysis point of view.

There are multiple scenarios. What if you are building your omop cdm from multiple data sources? Or if you are creating the visit table from multiple source tables? etc. CDM is designed for standardized analytics. If we have enhance the CDM to reference the primary key of the record from which the CDM record was derived - then we will probably have to not only have the source key-record, but also metadata information about which table in what source data is that key referring to. There may also be situations when one record in the CDM may be a composite of several records from source (i.e. one to many). So we will need a flexible solution like a JSON object?

I think the easiest solution is a custom local table that sits outside the CDM and is designed and maintained by your local team. This custom reference table could have the key that links the CDM record to the source record.

1 Like

I remember that discussion about how to use 'source_value’s in the CDM tables.
@Haroun_Chahed I might not understand what your situation is, but I will store those data in ‘visit_source_value’ if it is possible.

Hi @Haroun_Chahed, I understand what you are asking here. You have visit_ids that do not conform to the integer data type of the VISIT_OCCURRENCE_ID and want to store them somewhere. You are correct that VISIT_SOURCE_VALUE is used for the source value of VISIT_CONCEPT_ID like IP, OP, etc. Can I ask your use case for needing to keep the source id here? In databases where I am unable to keep the source it is still feasible to trace back using the PERSON_SOURCE_VALUE and date of service. If that doesn’t work for you, you can open an issue here which is where we discuss changes and additions to the CDM to make sure we are all on the same page. The information for the workgroup meetings are here if you would like to join, which is where we discuss and accept proposals.

1 Like

We did have a discussion for source_value (I know the focus is somewhat different from yours)
This thread might be helpful for you:

Hey, @SCYou,
Thanks for the reference. I think in that context, the source_to_concept map is about mapping custom, source-specific codes to a vocabulary concept. For @Haroun_Chahed, I think he needs a new column like ‘source_record_id’ where you can store a record identifier from the source system. I’m not sure if a single column would solve this problem tho. I’ve seen ETLs where visits (in this example) could be sourced from multiple tables (an Encouter table, an ER admission table, etc) and so just tstoring a record ID may not give you the info you need to trace it back to the exact source table (unless you put formatted value in the column so you could indicate the source table + the table row ID). It makes sense from an ETL debugging perspective, but I don’t imagine this type of information would be useful in an analytic use case.

@Haroun_Chahed: nothing restricts you from adding your own schema or tables to your local CDM instance. You could have a ‘record_map’ table in your database where during your ETL, when you create a visit record (which will have a visit_occurrence_id) you could write to a record_mapping table that records what CDM records map to your source system records. Then you can put the approprate level of detail into your mapping table to go from a CDM domain table (like visit) back to your local source table. I would most definitely prefer a solution that doesn’t expand the CDM standard tables if there’s another way to manage this outside of the CDM tables.


@Haroun_Chahed, I also agree with @Chris_Knoll 's comment!

@Gowtham_Rao Thank you for your reply. A custom local table sounds like a good solution that would preserve my source values without altering the OMOP CDM schema.

@clairblacketer Thank you for your reply. To answer your question, I wanted to keep the source ID for 2 reasons
1. If I am adding more visit_occurences in the future, I will need the source primary key to make sure I am not remapping an existent visit_occurence.
2. When mapping another table, for instance, condition_occurence, I will need to join my local condition table T1 (that is equivalent to condition_occurrence in the CDM) to visit_occurence T2 using a primary key of T2 that exists in T1.
Regarding tracking back using the person_source_value and the date of service, that would definitely work! Thank you for referencing other resources and for your advice.

@Chris_Knoll thank you for your reply.

Back at Monte, we locally maintained mapping tables, containing source database, table, and column and respective OMOP table and column. This could work for multiple scenarios, including when multiple source records are mapped to one OMOP record. You may go further by adding transformation rule to the table, if any. This is a good way to track provenance of your data.

If I remember correctly, @Daniella_Meeker had proposed to introduce one Event table containing all the source record IDs mapped to various OMOP domain record IDs to preserve provenance of all the records ETLed into OMOP CDM.

I hope this helps.