OHDSI Home | Forums | Wiki | Github

Vocabulary changes between versions

Continuing the discussion from v5.0 08-JUN-16:

Once again, either I am missing something obvious, or we are doing something wrong loading vocabularies, or both :frowning:
When I compare the latest version (v5.0 17-AUG-16) of vocabularies we downloaded with the old version (v5.0 2014-10-15), I see the following:

As far as I understand, ICD-10 codes are now divided between WHO and NCHS, while previously were all lumped under WHO (even though there was a NCHS vocabulary) (NB: where are these reorganizations described?)
Similarly, LOINC is now all in one vocabulary while before was split between LOINC Multiaxial Hierarchy and LOINC.

What concerns me are several now missing vocabularies from FDB and the reduced numbers of concepts for generic sequence codes from FDB (now called “Clinical Formulation ID”) and MeSH headings. We are having problem getting CPT4 codes, but it seems that error we were having is resolved, and we’ll get them soon.

@apeshansky:

I have no idea how you got to hte numbers you got. I have:

ICD-10 codes are issued by the WHO. The NCHS, in cooperation with the CMS, extends that to the American modification ICD-10-CM. Other countries have other modifications, Germany for example has ICD-10-GM.

Yes, LOINC is now together. The split is preserved in the concept_class_id.

FDB is right there, alive and kicking.

@Christian_Reich: Thank you for your reply.
The numbers I quoted are from

select a.vocabulary_name, a.vocabulary_id, NVL(count(b.concept_id),0) as "v5.0 17-AUG-16" 
from omopv4.vocabulary a left outer join omopv4.concept b on b.vocabulary_id = a.vocabulary_id 
where a.vocabulary_id != 'None' group by a.vocabulary_name, a.vocabulary_id order by 1;

(Because of the local idiosyncrasies the latest OMOP tables are in schema omopv4 - don’t ask :wink: )
So, apparently we are not loading the vocabularies properly; the above query returns 64 rows, and somehow after uploading CPT4 the numbers changed. I now see 29659 concepts in GEN_SEQNO, 14579 in CPT4, etc.
But still the only (FDB) vocabulary is

Clinical Formulation ID (FDB)   GEN_SEQNO   
  • no traces of the other three, or of Medi-Span one.

@apeshansky:

That’s explains some of it: In V4 we don’t load into the CONCEPT table all the codes. Essentially, the V4 CONCEPT table contains only the Standard Concepts (as we call them in V5), which are the ones that can go into the *_concept_id fields of the CDM tables, or the Classification Concepts which are hierarchical ancestors to the these. The other source codes are relegated to the SOURCE_TO_CONCEPT_MAP table, you’ll find them all there. Except for some we are making an exception and do add them to CONCEPT as well, but with a concept_level=0. GCN_SEQNO and GPI codes do not enjoy that privilege.

I know it’s messy. Use V5 if you are trying to figure things out. It’s a lot cleaner. We made it for that reason. Everything is in CONCEPT.

@Christian_Reich: Sorry, I was unclear.
While the schema name in Oracle is OMOV4, the tables and data it contains are all OMOPV5 data model - OMOPV4 is just the name of the user the schema is stored under.
And it turns out the person who loaded the latest release omitted loading ETC, GPI, Indication, and Multilex vocabularies, erroneously believing that we do not have license.
So it all turns out to be our local weirdness :wink:

t