OHDSI Home | Forums | Wiki | Github

[2022 US Symposium] #65 - Smoking Conventions

Please use this spot to document:

  1. Describe the issue/topic?
  2. What do we know about this topic? What has been discussed?
  3. What are recommendations for how to handle this issue/topic?
  4. What next steps should be taken?

Related posts:

@Christian_Reich has a very good proposal linked above. It covers smokeless tobacco consumption and nicotine smoking, not marijuana or other substances. Since we don’t have a lot of time during our Themis workshop, we need to be focused on a solution for nicotine.

The proposal:

  • Creates a hierarchy with smoking at the top

  • Covers e-cigarettes/vape pens, pipe, passive/2nd hand smoke, smokeless, etc.

  • Records intensity of smoking into trivial to very heavy categories.

  • Previous or ex-smoker gets relegated to “History of” category like all history of concepts

One item not covered is the “pack years” concept. There is not a standard concept_id for this idea. UK Biobank has a non-standard concept_id = 35811050. Do we make this concept standard since there isn’t a standard concept_id for this idea? Then have the observation.value_as_number contain the # of pack years? We will need to keep it outside the hierarchy because pack years can’t be determined to be trivial versus heavy because the age of a person matters. Example: 20 packyear record is a very different category for a 30yo person versus an 80 yo person. And as Christian and Oleg point out, pack years is a measure of cumulative damage versus cigarettes per day which is current usage.

This proposal also does not cover:

Survey data. Survey data has a working group and should continue to be modeled and debated there.

Tobacco and nicotine data are not easy for a number of reasons. It’s not always recorded, it’s patient reported, it’s not always reported accurately, it can begin years or decades before our Observation Period starts, there’s lots of free text out there, etc. Our goal is solid conventions covering the majority of use cases. And as our data, vocabularies, recording methods, and use cases evolve, so will the conventions and the CDM.

Thoughts?

On Sunday, October 16th at the CDM workshop we discussed the above issue. We came to the following conclusions:

1. Describe the issue/topic?

Currently, the conventions lack guidance on how tobacco usage should be stored in the CDM. There are multiple standard concept_ids with the same/similar meaning in different domains.

2. What do we know about this topic? What has been discussed?

This has been discussed on the forums for many years. And was also discussed at the original Themis meeting in 2018. See the forum post above for the most recent and comprehensive discussion.

3. What are recommendations for how to handle this issue/topic?

Proposed Solution: Tobacco and nicotine data are not easy for a number of reasons. It’s not always recorded, they are patient reported data, it’s not always reported accurately, it can begin years or decades before our Observation Period starts, there’s lots of free text out there, etc. Our goal is solid conventions covering the majority of use cases. And as our data, vocabularies, recording methods, and use cases evolve, so will the conventions and the CDM.

  1. Limit the scope of the solution.
    a. Only tobacco (no marijuana, smoke from fires, etc.)
  2. Treat synonyms the same
    a. Tobacco, cigarette, hand rolled cigarettes, etc. are the same
    b. 2nd hand smoke, passive smoking are the same
    c. Hookahs, water pipes, etc. are the same
  3. Family history, history of …, and questionnaires & survey data are out.
    a. Family history and “history of” have their own conventions
    b. The Survey WG are working on survey data
  4. Only cigarette frequency is measured in the existing Concepts, cigars, pipes and hookahs almost never are. The cigarette frequency per day definition is:
    a. trivial=0-1 cigarettes per day
    b. light=1-9
    c. moderate=10-19
    d. heavy=20-39
    e. very heavy or aggressive=>40

*We don’t need the exact number of cigarettes (it is probably false precision anyway)
*Cigarette frequency is for current usage

  1. Pack years is needed for lifetime usage. Since a packyear severity is measured in a combination of age and lifetime usage. The concept will be a child of “cigarette” and not required to be put into a severity category at this point in time. As the use cases arise, we can further hone these conventions.
  2. Here is what we are not modeling, mostly because we didn’t actually find that many concepts:
    a. Time since cessation
    b. Episodic smoking
    c. Age of start of smoking
    d. Duration of unsuccessful cessation
    e. Overall time of smoking (irrespective of strength)
  3. We have thought about, but have not modeled at this time: nicotine patches and gum, nicotine dependency
  4. New hierarchy:

You map your data in at the granularity you have. So, if the record is Heavy Cigarette smoker, you pick the concept_id representing this idea. If you have a flag for Cigarette smoker: yes, then you pick the concept_id representing this idea.

4. What next steps should be taken?

  • The Vocabulary team will notify us which concept_ids will be standard for each idea above. And we can put this information into the CDM conventions

  • The Vocab team will make some concepts non-standard and map them to standards and create new hierarchy

@Christian_Reich @Alexdavv @zhuk

Do we have a concept for each of these, or will we need to create some de-novo?

We’ll recreate the whole story de novo in OMOP Extension to make it all clean and not confusing by old hierarchy, other relationships and different flavors of meaning.

Like we did in Specialty and Visit?

There we mostly preserved the source semantics, structure, and entire donor vocabularies.
Rather Cancer modifier…

Do we have a convention agreed already?

If yes, when we can expect the proper concepts?

We’re planning to release them in December

2 Likes

@Alexdavv what are your thoughts about working with the SDOs to get the content added to the appropriate vocabulary (LOINC or SNOMED) rather than creating the concepts in the OMOP Extension?

We - and probably other organizations - currently store concept codes for tobacco smoking information natively in our EHR (SNOMED concept ids when captured in social history, LOINC codes when captured in flowsheets)

Only concepts from industry standard vocabularies - not OMOP Extension concepts of course - can be mapped natively in the EHR.

Best,
Piper

Because it would take an enormous amount of time. Even within OMOP, it took us years to agree, author the content, and release (hopefully, this week)?! Besides that, in most cases it’s another way around - they have much more content than we need. And the content is modeled and organized in the hierarchy in another way that we can easily adopt in OMOP.

Dear Community, I want to share with you our updates that were done 16 January and tell you about our future plans.

We introduced a set of tobacco or its derivatives-related concepts to accompany these ETL Smoking conventions in a few axes in OMOP Extension vocabulary. The top concept of hierarchy is Findings of tobacco or its derivatives use or exposure. Tobacco users are now defined according to the type of the product they use (Smokeless, Electronic, Cigarettes, Cigars, etc.), while cigarette smokers are also classified according to the severity of smoking (Trivial, Light, Moderate, Heavy, Very heavy). Cigarettes pack-years smoked during life is intended to capture the cumulative consumption of cigarettes.

Newly created concepts are Standard and should be used during ETL processes and mapping. We are going to destandardize concepts from other vocabularies during further releases. For now, our priority is SNOMED Standard concepts as many Non-Standard concepts from other vocabularies have “Maps to” link to them.

We would like to share an update on our progress with the Smoking hierarchy: in this v20230531 release, we have remapped smoking-related SNOMED concepts to new OMOP Extension concepts. As a result of these changes, certain concepts from source vocabularies such as ICD9CM, Read, CIEL, etc., have lost their mappings.

Explanation and details

Each vocabulary has its own load_stage, meaning that to extend these missing mappings to new OMOP Extension concepts, we must execute the load_stage of each vocabulary. This process often exposes numerous pitfalls due to variations in vocabulary versions, interdependencies, relationships, and domain assignments. Consequently, running each vocabulary and achieving satisfactory results consumes a considerable amount of time. We will address this issue in future releases.

We identified some issues that need to be discussed with the community:

  1. In the OMOP Extension, we have various types of smokers, including Cigar smoker, Cigarette smoker, Electronic cigarette smoker, etc. However, we lack a common concept that encompasses all types of smokers. The parent concept for these types is Tobacco or its derivatives user, which have the concept Smokeless tobacco user in subsumes. Consequently, we are currently assuming that concepts such as Smoker, Smokes tobacco daily, and Smoker in home refer to cigarette smokers and mapping them accordingly to the concept of Cigarette smoker.

We believe it would be beneficial to have a new concept that serves as a parent for all types of smokers. This would provide a more comprehensive and inclusive representation of smoking behavior within the OMOP Extension.

What do you think about it? Do we really need a new concept that would encompass all smokers?

  1. Another question pertains to the concept Currently doesn’t use tobacco or its derivatives. Currently, we have mapped all “ex-smoker” concepts to this concept. However, it is important to note that a person can be an ex-cigarette smoker but still use other tobacco products such as smokeless tobacco. Mapping concepts like Ex-user of moist powdered tobacco to Currently doesn’t use tobacco or its derivatives is not only logically incorrect but also problematic because Currently doesn’t use tobacco or its derivatives has the synonym “Current non-smoker.”
Examples of current mapping
concept_id_1 concept_name_1 relationship_id concept_id_2 concept_name_2
4052949 Ex-cigar smoker Maps to 903651 Currently doesn’t use tobacco or its derivative
4052949 Ex-cigar smoker Maps to 1340204 History of event
4052949 Ex-cigar smoker Maps to value 903664 Cigar smoker
4092281 Ex-cigarette smoker Maps to 903651 Currently doesn’t use tobacco or its derivative
4092281 Ex-cigarette smoker Maps to 1340204 History of event
4092281 Ex-cigarette smoker Maps to value 903657 Cigarette smoker
44811943 Ex user of electronic cigarette Maps to 1340204 History of event
44811943 Ex user of electronic cigarette Maps to value 903655 Electronic cigarette smoker
4052465 Ex-pipe smoker Maps to 1340204 History of event
4052465 Ex-pipe smoker Maps to value 903663 Pipe smoker

A similar situation arises with the concept Never used tobacco or its derivatives. This concept should not be equated with “Never smoked tobacco.” There may be individuals who have never smoked tobacco but have used other forms of tobacco or derivatives.

We need to establish clear rules for when this concept is appropriate and consider the possibility of adding new concepts or modifying the existing one.

  1. We had extensive discussions within the team regarding concepts such as Maternal tobacco use in pregnancy, Stopped smoking before pregnancy, Stopped smoking during pregnancy, and Smoked before confirmation of pregnancy. We remapped them to smoking status and saved pregnancy information only in one case.
Examples of current mapping

It is important to acknowledge that this mapping decision compromises the granularity and functionality of SNOMED. We would like to ensure that everyone agrees that this is a suitable decision.

  1. The earlier community decided to treat the concepts related to nicotine dependency as Standard. However, we encountered concepts that simultaneously indicate nicotine dependency and reflect smoking status. In such cases, we made them Non-Standard and remapped them.
Examples of current mapping

What is your opinion on this matter?

  1. Within SNOMED, there are certain complex concepts that pose a challenge when it comes to deciding whether to maintain them as Standard or not. Furthermore, even though we concluded that certain concepts should be destandardized, we faced the question of whether a universal replacement link was necessary (Findings of tobacco or its derivatives use or exposure). To witness our implementation of this process, kindly follow the provided link. We highly appreciate your valuable insights and suggestions if you come across any opportunities for improvement.

@Christian_Reich @MPhilofsky @Alexdavv @zhuk @Vlad_Korsik

Good questions

  1. I don’t think we really need one common ancestor for various type of smokers. We proceedeed with oversimplification with the whole model

So if your data is granular enough, use specific type of smokers. If not, use Cigarette smoker. If you need all smokers, just pick all of them while building your concept set / cohort.

  1. Do we have data granular enough?

Let’s assume patient smoked cigarettes for 5 years, then switched to moist tobacco for 2 years, then smoked 2 packs a single day for some reason, and then switched to something else. Would it be accurately reflected in the data? How should we treat this patient? Yes, Tobacco or its derivatives user
is the right concept.

Now never users. We can name it a feature of the model. You no longer need to care about these small differences.

If the patient never smoked, but used other types of tobacco, he/she is a Tobacco or its derivatives user, period.

“Never smoked” was included in the synonyms list because it is a very common name to discuss tobacco behaviour in clinics and because logically, every smoker used tobacco. So
IF Never used tobacco THEN Never smoked.
It is a small change. We can remove the synonym if it feels confusing.

  1. Personally I think that we lose too much here. We already spotted some analysis where these concepts could have been helpful. Let’s here from the Community.
t