OHDSI Home | Forums | Wiki | Github

[2022 US Symposium] #65 - Smoking Conventions

Please use this spot to document:

  1. Describe the issue/topic?
  2. What do we know about this topic? What has been discussed?
  3. What are recommendations for how to handle this issue/topic?
  4. What next steps should be taken?

Related posts:

@Christian_Reich has a very good proposal linked above. It covers smokeless tobacco consumption and nicotine smoking, not marijuana or other substances. Since we don’t have a lot of time during our Themis workshop, we need to be focused on a solution for nicotine.

The proposal:

  • Creates a hierarchy with smoking at the top

  • Covers e-cigarettes/vape pens, pipe, passive/2nd hand smoke, smokeless, etc.

  • Records intensity of smoking into trivial to very heavy categories.

  • Previous or ex-smoker gets relegated to “History of” category like all history of concepts

One item not covered is the “pack years” concept. There is not a standard concept_id for this idea. UK Biobank has a non-standard concept_id = 35811050. Do we make this concept standard since there isn’t a standard concept_id for this idea? Then have the observation.value_as_number contain the # of pack years? We will need to keep it outside the hierarchy because pack years can’t be determined to be trivial versus heavy because the age of a person matters. Example: 20 packyear record is a very different category for a 30yo person versus an 80 yo person. And as Christian and Oleg point out, pack years is a measure of cumulative damage versus cigarettes per day which is current usage.

This proposal also does not cover:

Survey data. Survey data has a working group and should continue to be modeled and debated there.

Tobacco and nicotine data are not easy for a number of reasons. It’s not always recorded, it’s patient reported, it’s not always reported accurately, it can begin years or decades before our Observation Period starts, there’s lots of free text out there, etc. Our goal is solid conventions covering the majority of use cases. And as our data, vocabularies, recording methods, and use cases evolve, so will the conventions and the CDM.

Thoughts?

On Sunday, October 16th at the CDM workshop we discussed the above issue. We came to the following conclusions:

1. Describe the issue/topic?

Currently, the conventions lack guidance on how tobacco usage should be stored in the CDM. There are multiple standard concept_ids with the same/similar meaning in different domains.

2. What do we know about this topic? What has been discussed?

This has been discussed on the forums for many years. And was also discussed at the original Themis meeting in 2018. See the forum post above for the most recent and comprehensive discussion.

3. What are recommendations for how to handle this issue/topic?

Proposed Solution: Tobacco and nicotine data are not easy for a number of reasons. It’s not always recorded, they are patient reported data, it’s not always reported accurately, it can begin years or decades before our Observation Period starts, there’s lots of free text out there, etc. Our goal is solid conventions covering the majority of use cases. And as our data, vocabularies, recording methods, and use cases evolve, so will the conventions and the CDM.

  1. Limit the scope of the solution.
    a. Only tobacco (no marijuana, smoke from fires, etc.)
  2. Treat synonyms the same
    a. Tobacco, cigarette, hand rolled cigarettes, etc. are the same
    b. 2nd hand smoke, passive smoking are the same
    c. Hookahs, water pipes, etc. are the same
  3. Family history, history of …, and questionnaires & survey data are out.
    a. Family history and “history of” have their own conventions
    b. The Survey WG are working on survey data
  4. Only cigarette frequency is measured in the existing Concepts, cigars, pipes and hookahs almost never are. The cigarette frequency per day definition is:
    a. trivial=0-1 cigarettes per day
    b. light=1-9
    c. moderate=10-19
    d. heavy=20-39
    e. very heavy or aggressive=>40

*We don’t need the exact number of cigarettes (it is probably false precision anyway)
*Cigarette frequency is for current usage

  1. Pack years is needed for lifetime usage. Since a packyear severity is measured in a combination of age and lifetime usage. The concept will be a child of “cigarette” and not required to be put into a severity category at this point in time. As the use cases arise, we can further hone these conventions.
  2. Here is what we are not modeling, mostly because we didn’t actually find that many concepts:
    a. Time since cessation
    b. Episodic smoking
    c. Age of start of smoking
    d. Duration of unsuccessful cessation
    e. Overall time of smoking (irrespective of strength)
  3. We have thought about, but have not modeled at this time: nicotine patches and gum, nicotine dependency
  4. New hierarchy:

You map your data in at the granularity you have. So, if the record is Heavy Cigarette smoker, you pick the concept_id representing this idea. If you have a flag for Cigarette smoker: yes, then you pick the concept_id representing this idea.

4. What next steps should be taken?

  • The Vocabulary team will notify us which concept_ids will be standard for each idea above. And we can put this information into the CDM conventions

  • The Vocab team will make some concepts non-standard and map them to standards and create new hierarchy

@Christian_Reich @Alexdavv @zhuk

Do we have a concept for each of these, or will we need to create some de-novo?

We’ll recreate the whole story de novo in OMOP Extension to make it all clean and not confusing by old hierarchy, other relationships and different flavors of meaning.

Like we did in Specialty and Visit?

There we mostly preserved the source semantics, structure, and entire donor vocabularies.
Rather Cancer modifier…

Do we have a convention agreed already?

If yes, when we can expect the proper concepts?

We’re planning to release them in December

2 Likes

@Alexdavv what are your thoughts about working with the SDOs to get the content added to the appropriate vocabulary (LOINC or SNOMED) rather than creating the concepts in the OMOP Extension?

We - and probably other organizations - currently store concept codes for tobacco smoking information natively in our EHR (SNOMED concept ids when captured in social history, LOINC codes when captured in flowsheets)

Only concepts from industry standard vocabularies - not OMOP Extension concepts of course - can be mapped natively in the EHR.

Best,
Piper

t