Hello all,
I am checking RxNorm and RxNorm Extension vocabularies to compare with my list of drugs.
I got those lists from ATHENA and the site of ATHENA showed that the number of RxNorma and RxNorm Extension are 300,300 and 2,091,918, respectively.
I tried to import each CONCEPT.csv file in R, but it prints a different output.
> fileName <- "./CONCEPT.csv"
> RxNorm1 <- read.csv(fileName, header=TRUE, sep="\t")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
> RxNorm2 <- read.csv(fileName, header=TRUE)
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
EOF within quoted string
> str(RxNorm1)
'data.frame': 86565 obs. of 10 variables:
$ concept_id : int 1146945 1146954 1147044 756315 756316 756317 756318 756319 756320 756321 ...
$ concept_name : chr "concept.concept_id" "concept.invalid_reason" "observation_period.observation_period_id" "metadata.metadata_type_concept_id" ...
$ domain_id : chr "Metadata" "Metadata" "Metadata" "Metadata" ...
$ vocabulary_id : chr "CDM" "CDM" "CDM" "CDM" ...
$ concept_class_id: chr "Field" "Field" "Field" "Field" ...
$ standard_concept: chr "S" "S" "S" "S" ...
$ concept_code : chr "CDM1" "CDM10" "CDM100" "CDM1000" ...
$ valid_start_date: int 20141111 20141111 20141111 20210925 20210925 20210925 20210925 20210925 20210925 20210925 ...
$ valid_end_date : int 20991231 20991231 20991231 20991231 20991231 20991231 20991231 20991231 20991231 20991231 ...
$ invalid_reason : chr "" "" "" "" ...
> str(RxNorm2)
'data.frame': 86637 obs. of 1 variable:
$ concept_id.concept_name.domain_id.vocabulary_id.concept_class_id.standard_concept.concept_code.valid_start_date.valid_end_date.invalid_reason: chr "1146945\tconcept.concept_id\tMetadata\tCDM\tField\tS\tCDM1\t20141111\t20991231\t" "1146954\tconcept.invalid_reason\tMetadata\tCDM\tField\tS\tCDM10\t20141111\t20991231\t" "1147044\tobservation_period.observation_period_id\tMetadata\tCDM\tField\tS\tCDM100\t20141111\t20991231\t" "756315\tmetadata.metadata_type_concept_id\tMetadata\tCDM\tField\tS\tCDM1000\t20210925\t20991231\t" ...
RxNorm1 is executed with sep="\t"
, but RxNorm2 is not.
You can see the different numbers of its observation.
RxNorm1 : 86,565
RxNorm2 : 86,637
My questions are:
- Why the number between ATHENA site and R result is different?
- Why do I get the different numbers from the same raw data in R according to
sep
option?
And, I also want to know the right R code to import RxNorm data precisely.
Thank you in advance!