OHDSI Home | Forums | Wiki | Github

Incorporating the Swedish drug vocabulary

Hi all (including @Christian_Reich),

@dmyers3 is working on the Swedish registry data, and is following the process for incorporating drug vocabularies. We do have two questions we need help with:

  1. Concept codes: On the Wiki it says concept codes should be generated, and should be unique across vocabularies. It also says the generated concept codes should start with ‘OMOP_’. I checked, and can’t find any concept codes in the current release of the vocabulary starting with ‘OMOP_’. Does that mean we should start with ‘OMOP_1’? If we want the codes to be unique across vocabularies, why not have a more unique (and informative) prefix, like ‘SE_DRUG_’ or ‘OMOP_SE_DRUG_’?

  2. Processing script: The provided script appears to be in Oracle SQL. Any chance you have it in SQL Server SQL (or OHDSISql)?

Hi @schuemie (including @schuemie):

  1. It’s ‘OMOP%’, no underscore. So,

    select * from concept where concept_code like ‘OMOP%’;
    You can start with OMOP1, but it probably is better if you start with the last OMOP%+1. Just to be on the safe side and not duplicate codes, even though they are in different namespaces.

  2. Oh no. No such chance. The stuff is heavily Oracled, for good reasons and for bad reasons. Just give us the input files and we put it in for you. The scripts are still tweaked with anyway.

Hi @Christian_Reich,

I would be happy to give you the input files, but I need a little help finishing the tables. I think I understand how to do the last 3 of the 4 required, but I am a little twisted up on DRUG_CONCEPT_STAGE.

The attached file has some specific examples of how I think I should do it. Blanks mean I am definitely unsure. Non-blanks mean I would have been bold enough to try.

The table at the top is a cleaned up version (and sample subset) of the full list of approximately 140,000 package IDs registered in Sweden (and pretty close to the whole Nordics too). For each package ID, I have already mapped to as many concepts as I could individually (brand name, ingredient, dose form, strength). Each package ID then has 1-4 concept_IDs assigned to it. (Those mappings probably aren’t 100% perfect but they are quite good thanks to Usagi on the individual concepts and some translation.)

I tried running them through Usagi collectively with those concepts separated by semicolons in the automatic mapping input field, but it just wanted to leave them all at the ingredient concept even though quite a few had dose and form concepts that I could match to more detailed concepts (e.g. clinical drug). I know that match won’t be 100% due to different pack sizes or other unique aspects by country, but I manually found several that do match.

Can you help me finish the logic? My end goal is to have each NPLPackID mapped as precisely as it can be. Then I can create the tables without too much trouble and offer many thanks if you could run them in the Oracle SQL.

Thanks for always being so helpful. I think we’re close now!

vocab_mapping.xlsx (13.0 KB)

@dmyers3:

Not too bad, David. Getting close. Couple questions.

  1. What’s the official name of the drug registry, and is there a website (even closed to the public) that has it? “SE_LMED_DRUGS” sounds a little lame. Also, you will need version information, because they tend to be in motion over time.
  2. Do you have a concept name for DRUG_CONCEPT_STAGE.concept_name? Could be in Swedish. Of is that the “name” field?
  3. For DRUG_CONCEPT_STAGE.concept_class_id, you need to put in one of the following: Device, Unit, Ingredient, Drug Product, Supplier, Brand Name, Dose Form. For “Packages” put in Drug Products. List all Units you use, Suppliers, Brand Names and Dose Forms. If you don’t have them coded (no special identifier for them), you have to make up a code, use either OMOP% or the string. For Units, we usually use the string. Make sure you list all syringes, bandaids, cartridges and other products sold in pharmacies and very often included in these lists as “Device”. Otherwise the algorithm will try to figure out their indgredients and strengths and match them, throwing a ton of erros.
  4. The DRUG_CONCEPT_STAGE.concept_code should contain the NPLPackID and whatever you use for the other entities
  5. In the RELATIONSHIP_TO_CONCEPT.concept_code_2 field, you should use NPLPackID for the drugs, and strings for the Units
  6. In the RELATIONSHIP_TO_CONCEPT.concept_id_2 it could also be an RxNorm Extension Integer
  7. In the RELATIONSHIP_TO_CONCEPT.precedence field, you need to fill in values >1 if you have more than one possible mapping
  8. In the RELATIONSHIP_TO_CONCEPT.conversion_factor you need to put in the conversion if the mapping of units isn’t exact, (e.g. liters to milliliters = 0.001)
  9. The INTERNAL_RELATIONSHIP_STAGE looks good.
  10. The DS_STAGE needs work. For solid drugs (tablets, capsules, suppositories, etc.) you are good. For liquid ones you need to use the numerator and denominator fields.

Let me know.

I’ve tried to generate the appropriate entries for @dmyers3’s first drug (Cabergoline Teva).See the notes for additional comments.

@Christian_Reich: could you check if I did it correctly?

vocab_mapping_2.xlsx (14.5 KB)

  1. The official name is Läkemedelsregistret. It is all publicly available, although some pieces of information have to be requested. We have a master list of the NPL Pack IDs from another related agency called MPA.They are in a state of transition with their website, so some things that used to be posted now have to be requested. They are very cooperative though.

  2. It is kind of the “name” field but I might have slightly more descriptive ones coming from a recent request. One problem with the name field is that it is not unique.

  3. Got it. That makes sense.

  4. OK.

  5. I’ll see what you say to Martijn.

  6. Understood.

  7. Do you have a recommended precedence for the available fields?

  8. I have units and amounts so that shouldn’t be a problem. I just picked simple ones here.

  9. Yes! :smiley:

  10. I think I have that for most of them. There’s a reason the dose field in the sample is dose_number_1. I have dose_number_2 in the full table and it will generally be a numerator/denominator situation like you describe. It probably needs a little cleaning because I was focused on simplest cases first, but I did set up the text extraction to find things like “4 g/0.5g” or “49.2 mg/ml.”

I’ll start the cleanup on what I know for sure and then wait for some additional information both from MPA here in Sweden and you guys.

Thank you!

t