I am testing my full ETL scripts for a new data refresh and it seems that some concepts are missing from the Athena downloaded Vocabulary v5.0 17-Jul-15.
My specific example is that a query where domain_id = ‘Gender’ I have only 1 term in the latest concept.csv file and the older version I was using (v5.0 2014-10-15) returns 10 different rows.
Further exploring the concept_name ‘Female’ was assigned concept_id = 8352 and this is not found on the latest vocabulary, and searching it by string literals I find three matching concepts of concept_class_id ‘clinical finding’ and an one of concept_class_id ‘answer’
I went as far as looking via grep for the string Female TAB Gender (grep -P ‘Female\tGender’ CONCEPT.csv) and It does not return anything. But for Female TAB condition (grep -P ‘Female\tCondition’ CONCEPT.csv) I get two results (the same found via the SQL query).
Did all the Gender concept_id’s got deprecated/deleted?
P.S. Also the Ethnicity concepts seem to be missing (I am checking using both domain_id = ‘Ethnicity’ and concept_class_id=‘Ethnicity’
Get another copy from the Athena website. We had a bug through which the Gender and Ethnicity vocabularies were omitted in the zip file. It’s all there. Plus, we never delete concepts. At worst, we deprecate them, but you can always see them.
Thanks! I have downloaded the file again and it now contains the Gender vocabulary.
As a small ‘glitch’ the casing is inconsistent with the rest, this is what I get:
8532 FEMALE Gender Gender Gender S F 19700101 20991231
8507 MALE Gender Gender Gender S M 19700101 20991231
8521 OTHER Gender Gender Gender O 19700101 20140731 D
8570 AMBIGUOUS Gender Gender Gender A 19700101 20140731 D
8551 UNKNOWN Gender Gender Gender U 19700101 20140731 D
I apologize if I am missing something obvious as I am just starting to get into omop and vocabularies, but I just downloaded the vocabulary and cannot see the 8507, 8532, etc values. I did greps as well, but nothing returned when I grepped for "8507 ". Should these values actually be in the concept.csv file (the lines mentioned by Juan in the last paragraph)?
My apologies. I am not sure what I was looking at before. I do see them now. I was grepping for "8507 " as there were 1256 instances of “8507” in the file. However, I should have grepped for “^8507” or “8507\t” apparently. I have found the data now.
Hello,
I downloaded vocabularies from Athena on Thursday and it seems as though Gender may be missing? Believe I used the default set when submitting the request.
Name of zip - vocab_download_v5_{C5C5608A-6267-E15A-ED4A-7E4FE22D753A}
Please advise.
Thank you
Rick