I would not support this change (to use String).
Storing a string as a key means that you might be tempted to put some sort of information embedded in the ‘key’ of the data. But these identifiers should not contain information about what the record is in the identifier…that’s called an ‘antipattern’
In addition, the resource size of the identifier would be extremely large compared to the equivalent 4 byte bigint:
549755813888 = 4 bytes
“549755813888” = 6*2 bytes = 12 bytes plus any varchar storage overhead in the encoding. It would be 24 bytes if your varchars are stored as UTF-16.
Finding the record by the id would also be a more expensive operation:
549755813888
549755813828
These strings are the same length, but you’d have to calculate a hash on it and compare those values…and if you’re storing the hash of the value anyway, then you’re not gaining anything except a ‘readable’ key which means you’re storing information in the key.
Indexing on these columns would be impacted from the larger storage size: less rows would fit within a page, hence making your B-tree much larger.
I am not sure I understand:
What does the data type of the PK/FK have to do with the ease of maintaining it?
Edit:
<oops>
I’m pretty sure R handles numbers natively very well.
</oops>
I understand your point now on R and 64bits…if you want to do certain operations on vectors of values (like avg, sum, etc), but…these are keys, you won’t be doing that. the as.numeric()
will handle that datatype.
You mention a GUID, which is not a string, but a 16 byte value. But it’s just another numeric value that is formatted as xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx…in order to use it in a query, you’d need to convert the string form into the underlying GUID value, which causes more overhead. But, a GUID is more ‘distributed network’ friendly in that any peer that generates a record of data can generate their own GUIDs and not have a real risk of collision with another datasource.
I’d be more supportive of a GUID as a key value than a ‘String’, so maybe if you change the title of the post to be specific about GUID, you might get more support
But I am having trouble finding information about ‘native’ support of GUID/UUID in R. Are you sure it’s natively supported?
-Chris