Will Atlas function properly if there is change in CDM schema

ranatech · March 31, 2023, 2:10am

Hi All,

We are currently installing Atlas with the OMOP CDM 5.3 DDL when creating the tables (link)

However, there are certain columns in CDM whose length we have increased a bit, based on the data we have in our case. Will this impact the overall functionality of Atlas. Please advise. Below are few of the tables and its column where we have made those changes:

Please note: data type remains same, only length has been changed.

Table name: condition_occurrence
Columns name & length: condition_source_value, **varchar(50)**
Changed length: **varchar(256)**

Table name: person
Columns name & length: person_source_value, **varchar(50)**
Changed length: **varchar(200)**

Table name: condition_occurrence
Columns name & length: stop_reason, **varchar(20)**
Changed length: **varchar(256)**

Please advise if the changes made above, based on just the length will have any issues in Atlas functionality.

Also, what if, based on the current dataset at our end, if we change the data type from integer to varchar, how much impactful will it be in terms of Atlas functionality? Meaning, for example, changing column name “person_id” from “integer” to “varchar”. This is required as we have hashed values for the id column.

Any advise on the above will be very helpful.

Thanks,
Rana

Chris_Knoll · March 31, 2023, 4:25am

Changing the column lengths on those CDM tables is not going to be an issue. Those are read-only for the tools, so there’s no problem.

person_id moving from integer to varchar will probably pose a problem: the cohort table that gets populated with the results of a cohort definition stores the subject_id as an int. You will run into a problem if you put some sort of hash value from your person_id column into the subject_id.

I would store the hashed person_id in a new column of the person table that you can add, or you can store the hashed person_id as the person_source value. Then generate a unique integerId for a person and map all the associated patient-level record from this uniqueID → person_id via the person_hash…

ranatech · March 31, 2023, 4:46am

Thanks a lot for your advise @Chris_Knoll. Appreciate it.

I will get back to you if I have any further clarification on this.

Regards,
Rana

ranatech · March 31, 2023, 4:59am

Hi @Chris_Knoll,

Just need some more pointers on the below columns datatype changes. Can you suggest an alternative as what you suggested for person_id in person_source_value and whether these changes (shared below) would have any effect in Atlas functionality?

Table: condition_occurrence
Column name: condition_type_concept_id
From datatype: integer
To datatype: varchar(256)

Table: visit_occurrence
Column name: visit_occurrence_id
From datatype: integer
To datatype: varchar(200)

Look forward to your valuable feedback.

Thanks,
Rana

Chris_Knoll · March 31, 2023, 5:07pm

condition_type_cocnept_id is a concept_id, so that will come from the concept table, which concept_id is an integer, so I am not sure why you would wan tot change condition_type_concept_id. I don’t think it will work.

visit_occurrence_id may be ok, because I do not think we make transitive use of the values (ie: store them somewhere in a table that was defined as an integer). But, after hearing all of these examples, I would say you should not change any of these ID column types, and figure out how to add custom columns/tables to associate your varchar() values to the integer ID columns.

ranatech · April 3, 2023, 7:58am

Thanks for the reply @Chris_Knoll and apologies for the latey reply. Weekends took the best of me However, I would like state here that the condition_type_concept_id I was taking the example of, is for the diagnosis table i.e. condition_occurrence table. For us, we are storing the source concept code which contains ICD 9 or ICD 10 or snomed codes. These codes are prefixed with some letters, meaning these are alpha-numeric data. This is one of the reasons why I was eager to change the datatype of this field.

Having said that, I really appreciate your response and I will check internally on this aspect and overall functionality of Atlas tool on whole.

Best,
Rana

Christian_Reich · April 3, 2023, 12:02pm

You actually don’t have a problem, and there is no need to tweak the model. If you have an ICD9 or 10 code, you can put that as is into the condition_source_value field. That is alphanumeric. You then look it up in the CONCEPT table using the concept_code field. All those codes are in there as a concept, with their numerical concept_id. For example E32.1 Abscess of thymus has concept_id 45591044. That integer you can then place into condiction_source_concept_id.

You can also then map this to the standard concept using the CONCEPT_RELATIONSHIP table (relationship_id=‘Maps to’), and find SNOMED concept 433738 Abscess of thymus. This integer goes into the condition_concept_id.

Now you have a valid OMOP CDM record.