I am currently working on making a source_to_concept_map using Usagi, and I am a little bit confused about 80/20 rule. For example, there are more than 5500 source codes, and by mapping 1000 of the most frequent ones, we mapped more than 99% instances of codes. Should I continue with mapping till i reach 80% of mapped source codes, or those 80% are actually pointed to the instances and thus we should proceed to the next phase?
tl;dr In your experience, what is the approximate threshold in % above which should we proceed to the next phase?
Does your source codes are distinct? Maybe the first 500 of your concepts contain by few thousands of counts and the others are 5-10 counts. So the first 500 concepts will cover more than 99% of mapping. But if every source code have the only one count - 99% are practically whole concepts.
If I get you right, your 99% is in 1000 codes