Infrequently-used main spoken languages

wibeasley · November 11, 2020, 9:51pm

Question 1: When mapping our EMR’s primary_language field to standard OMOP concepts, we encountered the following 6 entries. They do not appear in the list of 208 “main spoken languages” or in the OHDSI forums. (Except one thread about ethnicity.) We’ve already mapped entries like “persian” to “farsi”. The first two entries are definitely legit, and we guess that the latter four aren’t data entry errors, and reflect refugees.

What concept to you advise we map each too?

deaf (~100) (Is “Uses sign language” 4232738 a possibility?)
cherokee (<100)
chuukese (<100)
miao,hmong (<100)
mon-khmer,cambodian (<100)
zomi (<100)

Question 2: we have ~30 entries of “indian”. Do you advise that we guess and map it to “hindi”, even though the patient’s primary language could be something like tamil? What is typically more useful to someone analyzing OMOP datasets? Is the analyst likely to coarsen these concepts anyway, so an incorrect value of “hindi” is more valuable than a value of 0/missing?

(For context, the current effort is for the N3C project.)

Christian_Reich · December 2, 2020, 12:56pm

@wibeasley:

Usual questions: Do you actually have data? Like patients who declare their primary language is “Cherokee”? And do you have use cases for using this information in research?