I am checking RxNorm and RxNorm Extension vocabularies to compare with my list of drugs.
I got those lists from ATHENA and the site of ATHENA showed that the number of RxNorma and RxNorm Extension are 300,300 and 2,091,918, respectively.
I tried to import each CONCEPT.csv file in R, but it prints a different output.
Hello! The warning message “EOF within quoted string” could hint at the problem. It is possible that one of the characters is treated as end of file and cuts the imported file short. Could you please share what is the last entry in the dataframe?
Thank you for your answer.
I am a beginner in R, so I could not catch the meaning of the error message.
I searched how to modify my original code according to your answer, and I found the solution.
I added an option, quote="", then the error message did not be displayed.
But I still have curiosity why the numbers of observations in each data frame are different.
I attached the tail of the data frame.
In the case of no option(=RxNorm2), tab delimiter(\t) stays in the value, but it has more observations than the data of the tab option.
RxNorm1 seems perfect, but I cannot go to the next step due to the RxNorm2 problem.
Please let me know if I could not understand the vocabulary data.
According to R documentation, if no value is provided to sep parameter, R treats every flavor of whitespace character as separator, which by the way is really weird behaviour. It is not impossible that some concept names from different sources contain different whitespace characters naturally, as OMOP takes different vocabularies from many sources in. Best practice would be to always specify the separator.
In any case, there is still too few of rows to contain RxNorm or RxNorm Extension. How many rows are in the CONCEPT.CSV file? Did you include RxNorm and RxNorm Extension vocabularies in the download request on this page?
Under the well-separated variable condition, I import that RxNorm has 508,112 obs and RxNormExtension has 2,299,730 obs.
I downloaded these datasets from the page you referred to. However, as I first asked the Athena page shows the total number of obs are 300,300 and 2,091,918 obs, respectively.
I still cannot know the genuine number of observations of RxNorm and RxNormExtension, but I have checked all of my drug datasets is included the vocabulary of RxNorm and RxNormExtension.
Some error is cleared, but I can say the problem has not been fully worked out.