You are right.
In the era of Next Generation Sequencing (NGS) rather than Sanger Sequencing, ‘variant’ is hard to be described with ‘concept_id’. Because the lesser-known variants are much more than the famous variants.
Basically, the ‘variant’ is a myriad of combinations of 1) POSITIONs in DNA where sequence changes occur and 2) NUCLEOTIDEs(A,T,G,C) before and after modification.
It would be better to separate and record ‘gene’ and ‘change’ information than to define the infinite number of ‘gene + change’ cases as concept_id.
About 40,000 human gene lists can be standardized through the HUGO Gene Nomenclature Committee (HGNC).
The ‘change’ information is to create a rule that records in string data type according to the Human Genome Variation Society (HGVS) nomenclature.
At present, many translocation variants will not have concept_id in Athena.
All you can do is request a new concept_id or temporarily assign your own concept_id.
Alternatively, your data can be stored using the Genomic-CDM if you need although it’s not an official way yet.