OHDSI Home | Forums | Wiki | Github

Confusion about 80/20 rule in source_to_concept_mapping

(Etir) #1

Hello everyone,

I am currently working on making a source_to_concept_map using Usagi, and I am a little bit confused about 80/20 rule. For example, there are more than 5500 source codes, and by mapping 1000 of the most frequent ones, we mapped more than 99% instances of codes. Should I continue with mapping till i reach 80% of mapped source codes, or those 80% are actually pointed to the instances and thus we should proceed to the next phase?

tl;dr In your experience, what is the approximate threshold in % above which should we proceed to the next phase?

Thank you in advance!

(Mike Nerovnya) #2

Usually we are mapped 99% of codes, and then just inspect the remaining concepts for important unmapped of them.

(Etir) #3

To conclude: we should then map approximately 5000(of total 5500) codes?

Thank you for your helpful response!

(Mike Nerovnya) #4

Does your source codes are distinct? Maybe the first 500 of your concepts contain by few thousands of counts and the others are 5-10 counts. So the first 500 concepts will cover more than 99% of mapping. But if every source code have the only one count - 99% are practically whole concepts.
If I get you right, your 99% is in 1000 codes

(Etir) #5

Yeah, exactly 99.6% is in 1000 codes, other 4500 correspond to only 0.4%

(Mike Nerovnya) #6

Just inspect other 4500, maybe there are something important, even with low count

(Etir) #7

Thank you so much!