Vocabulary 4.5 July 17 update experience loading on RedShift

In CONCEPT.csv (note csv normally mean comma separated whereas this file is tab separated) rows 1,354,252 and 1,355,671 have non-printable characters (\o231 and \o260 respectively) and row 1,363,867 (last row in the table ) seems to be truncated. All I see is: ‘42740714 Computed tomographic’.

Row numbers are likely to be different for others since included vocabularies may differ, the concept ids with the non-printable characters are 2101900 and 2313647.

In addition, both HL7 Administrative Sex (12) and Ethnicity (44) are not in the Vocabulary table nor are there concepts in the Concept table.


Will take a look. Shouldn’t be the case, we de-UTFed all concept_name records.

Issue is fixed.
CONCEPT.csv will now contain all required vocabularies.
CPT4 will not be truncated during the import.
All records are stripped from non-ASCII symbols.