As everyone knows BIGINT is two time bigger than INT, docs.
Can I modify these data types in our case?
I’m sure that in our use case the BIGINT for person_id is unnecessary we have around 1million person in our source system.
So my questions:
1.) Why BIGINT necessary except a few column?
2.) Can I modify these columns data types? - can it hurt me in the future if I do this? (eg. compatibility with other systems)
As always, testing is the only way to know the impact on changing data types; but in general, the speed and size of a BigInt as opposed to an Int, does not have a big impact.
Compatibility for running scripts and/or tools, is likely to eventually cause an issue with narrowing the data type.
You can use int in lieu of bigint throughout, but bigint was chosen due to the size of some of the community’s CDM databases, hundreds of millions of patients.
My choice is the INT except with measurement_id, this is the only table where we have the chance to earn more than 2 billion data row in the following 5 to 10 years.
Even in this ‘answer’, it is said that the practical effect may be dependent on the work environment; this is very true. Replication of a BigInt is still much faster than a VARCHAR or a NVARCHAR, yet how many of those do we have?
I have built some pretty massive datasets and the only time I have seen a big impact on the database, was when doing self joins. Recursive tables can make BigInt joins noticeable slower than Ints.
Really, is it worth the work to keep a separate schema for maybe very little( or no ) benefits? Perhaps first focus on tables that have a recursive structure and work out from there if you need the extra speed.