OHDSI Home | Forums | Wiki | Github

Phenotype Phebruary Day 14 - Hypertension (emphasis on clinical description)

On Valentines week - lets start he cardiovascular phenotypes. This week - we will work on

Happy Valentines day @AzzaShoaibi

I decided to change the pattern of how we have been posting on the forums based on the feedback @ the OHDSI Phenotype Development and Evaluation workgroup (please join here). The feedback was - this is intimidating, too long and i dont know I (a new contributor) can do this.

So - instead of one long one per day post, i am going to break it down into multiple posts per day - hopefully this will make it easier to read.

I am going to try to use this cardiovascular week related posts to assert/express some of the best practice opinions that we have developed at the OHDSI Phenotype Development and Evaluation workgroup with focus on all the steps that needs to be done PRIOR to touching Atlas or creating code sets + the importance of such pre work.

1 Like

Step 1: Develop a shared clinical understanding of the phenotype

1. A) Is there an authoritative consensus document that describes the phenotype.

Yes - this is a common phenotype - that every clinician should know and most would agree that a source like

is authoritative. It is also of an appropriate target audience - eg. the material in this resource is targeted at a Fourth year medical student or above, including mid level providers such as practicing nurses, physician assistants.

If a material is considered patient education material - it is likely at a lower level than what a phenotyper needs.
If a material is considered to be designed for a specialist and specialist peers who are conducting specialized research on this phenotype - that would be considered very detailed for a phenotyper.

1. B) I then write up my short version of the reading - call it G-notes


	Chronic elevation of blood pressure - defined in 2017 Hypertension guidelines (normal, elevated, stage 1 hypertension, stage 2 hypertension)
	Most people asymptomatic - so true biological index date is not possible? Headache, dizziness or blurred vision.
	Etiology unknown in 90% - essential hypertension, but rule out secondary correctable form of hypertension especially if <= 30 or >= 55 years.
	Terms associated with Hypertension:
		- Normal, Elevated, Stage 1, Stage 2
		- Isolated Systolic hypertension
		- Controlled vs uncontrolled hypertension
		- Incident vs Prevalent hypertension
		- Malignant Hypertension - medical emergency
		- SPRINT hypertension definition - resting blood pressure.
	Secondary causes:
		- Renal artery stenosis (renovascular hypertension)
		- Renal Parenchymal Disease (Chronic Kidney Disease)
		- Coaractation of Aorta - children or young adults
		- Pheochromocytoma- sudden episodes, elevated plasma metanephrine
		- Hyperaldosteronism
		- Drugs - drugs (oral contraceptives, erythropoetin, decongestants, NSAIDs, glucorticoids, cyclosporine) obstructive sleep apnea, cushings disease, thyroid disease, hypercalcemia, acromegaly
Differential Diagnosis:
	Secondary causes - Hyperaldosteronism, coarctation of the aorta, renal artery stenosis, chronic kidney disease, and aortic valve disease 
		- Kidney related: Serum creatinine, BUN, urinalysis, 
		- Cardiovascular evaluation: CXR, ECG
		- Thyroid: TSH
		- Control blood pressure with minimal side effect
		- First line treatment: Diuretics, ACE inhibitors, ARB, CCB, beta blockers
	Except for secondary hypertension, is chronic. If untreated, will most likely progress
		Cardiovascular diseases and complications - acute myocardial infarction, congestive heart failure, stroke etc.
		Normal blood pressure – Systolic <120 mmHg and diastolic <80 mmHg
		Elevated blood pressure – Systolic 120 to 129 mmHg and diastolic <80 mmHg
			Stage 1 – Systolic 130 to 139 mmHg or diastolic 80 to 89 mmHg
			Stage 2 – Systolic at least 140 mmHg or diastolic at least 90 mmHg
			If there is a disparity in category between the systolic and diastolic pressures, the higher value determines the stage.

		High prevalence
		Hypertension treatment is most common reason for office visit + source of hypertensive medication dispensation
		Adequate blood pressure is low

1. C. Look at clinical vignettes - to understand the perspective of a treating clinician

Source: https://www.nice.org.uk/guidance/qs28/resources/clinical-case-scenarios-pdf-247324717
Clinical case scenarios: Hypertension (2013)


1. D. Write a clinical description
This is definitely the most important step - and the place i generally spend the most amount of time. This is the “target” we are trying to phenotype - and if we don’t all have a shared understanding of what we are trying to phenotype - we can not “evaluate” if we are truly identifying the people we want to study in our cohort definitions.

I describe this to be a clinician’s description of a group of persons (patients) current externally observable clinical state. It is expected to be a verbose semi-structured description specified in natural human language, and this is not a computer algorithm or diagnosis code. In fact, at this stage - the phenotyper should NOT be even thinking of implementing the description as either a concept set or cohort definition. If a phenotyper is thinking of codesets NOW - stop - reset - restart. You dont know what you are phenotyping - so dont think of codesets!

My way of introducing structure to this step is by following the same structure that has been used for years in medical training - requiring the following components - Overview, Presentation, Assessment, Plan and Prognosis. Each component is expected to contain at-least one or two lines of clinical knowledge that was synthesized by clinicians (preferably by a panel) with medical knowledge, harvested from authoritative sources like medical text books.

Here are some examples I wrote for phenotyping purposes:


Overview: Vitiligo is the most frequent cause of acquired depigmentation disorder of skin that is characterized by development of well-defined white macules on the skin.

Presentation: Vitiligo typically presents with asymptomatic patches of skin that is milky/chalky white in color without signs of inflammation. Although it can appear at any age or anywhere on the body, it has predilection for the face, around orifices, in genitals, and hands. Depigmentation of hair may also occur. Most commonly Vitiligo is non-segmental but can be segmental too especially around the trigeminal nerve distribution.

Assessment: Clinical diagnosis is straightforward based on history and clinical examination. Routine skin biopsy is not needed.

Plan: If rapid progression is suspected, low-dose oral corticosteroids. Phototherapy may cause stabilization.

Prognosis: Clinical course is variable, may remain stable or slowly progress – with extent and distribution changing over lifetime.

Chronic lymphoid leukemia:

Overview: Mature B cell neoplasm characterized by a progressive accumulation of monoclonal B lymphocytes. Similar to non-Hodgkins lymphoma SLL, but the key difference is that this disease manifests primarily in blood, while SLL in primarily lymph node.
Presentation: Most commonly asymptomatic and detected on routine blood test with abnormal lymphocytosis. Occasionally there may be constitutional symptoms like unexplained weight loss, fevers, night sweats, fatigue. CLL does not have lymphadenopathy while non-HL/SLL does.
Assessment: Suspected when there is absolute lymphocytosis on peripheral smear. peripheral blood/bone marrow cell counts, immunophenotypic analysis (flow cytometry), bone marrow biopsy/aspirate, lymph node biopsy/aspirate, spleen.
Plan: Depends on activity of disease/stage; the disease is extremely heterogenous. Nodal disease may involve radiation therapy. There is no single agreed first line therapy.
Prognosis: Chronic disease but may be asymptomatic with minimal progression, or may progress quickly

Lichen planus

Overview: is a rare disease affecting middle aged adults most commonly affecting skin as pruritic, purple, polygonal and papular lesions lasting a few millimeters each, that may coalesce to forms large lesions over time. Common areas of extremities, scalp and genitalia. It may also involve the mucus members of the oral cavity and esophagus. It is commonly associated with hepatitis C.
Presentation: development of pruritic, purple, polygonal, papular lesions in the extremities that are few millimeters in size.
Assessment: clinical examination, no specific testing
Plan: no recommended treatment. Options include corticosteroids, phototherapy.
Prognosis: commonly remits over few years

We cannot phenotype decision:
For me the main indication is - if clinicians are unable to clearly write the clinical idea in this structure - then more than likely this is something we cannot phenotype. Also - if a (panel of) clinician are unable to agree on the clinical description - then we cannot phenotype.

1 Like

1. D 2 - formalize the structure of clinical description

What should be in the clinical description

Minimum content:
Overview: is a key summary of the phenotype that highlights its salient features.
Presentation: is how the person or their clinician would first observe their phenotype.
Assessment: Once a person presents what would the clinician like to do to affirm diagnosis.
Plan: Assessment is likely to be followed by management plan - that is based on the patient preferences and established evidence.
Prognosis: Describes the expected future state, or the natural history of the phenotype – specifically how long will it last, what is expected to make it better or go away, what is expected to make it worse or become something else?

Additional content (recommended content - that i am still struggling with on how to formalize):

Similar outcomes: Are there phenotypes that are considered similar to the main phenotype of interest - that may co-exist or exclude the main phenotype? In medical practice, this is similar in idea to ‘differential diagnoses’. Conditions that may closely resemble the primary outcome but may not be the primary outcome of interest (overlapping conditions are acceptable, it does not have to be mutually exclusive). Note: this list does not have to be comprehensive, but we only need the top 2 or 3 key common differential diagnosis.

  • Is it normally considered typical for the phenotype of interest to co-exist (overlapping) with the conditions listed in the differential diagnosis? For each of the items in the similar list - would the presence of the condition in the differential diagnosis make the primary outcome
    • less likely?
    • more likely?
    • No difference?


Chronic Lymphoid Leukemia:

  • Non hodgkins leukemia
  • Mantle cell lymphoma


  • Tinea versicolor
  • Scleroderma

Patient factors that are NOT expected to occur with the phenotype of interest:

  • Are there any patient attributes that when present would make the co-occurrence of primary phenotype improbable? E.g., Prostate cancer and female gender.
  • Are there any patient attributes that when present makes the co-occurrence of primary phenotype less likely? E.g., Prostate cancer in 6-year-old boy, or new onset autism diagnosis in a 90-year-old person.
  • For treatments, are there treatments that when present would indicate the presence of the outcome unlikely? MRI head on the date of diagnosis of knee arthritis.

Patient factors that are expected to occur with the outcome of interest:

  • These are commonly the diagnostics tests, procedures or even treatment that are seen commonly associated with the outcome e.g., bone marrow biopsy and CLL
  • It may also be patient factors such as increasing age and the diagnosis of prostate cancer.

What else do we know about the outcome of interest: This is a catch all section, try to capture any other relevant known information.

  • Literature review: have others performed epidemiological/observational research-based studies on the outcome of interest. Are there published literature available that have studied the outcome of interest in observational data? Have they described how they developed an operational definition for the outcome of interest?
  • Are there any prior expectations of prevalence/incidence of the outcome of interest in (any) population? E.g., about 1.1% of adults above the age of 65 in USA have CLL (cross sectional prevalence).
  • Are there established relationships between (any) exposure and outcome has been established (e.g. azithromycin and cardiac arrythmia), can we obtain rates of the outcome (e.g. % of patients with exposure who get the outcome – preferably with time-to-event e.g. 5 days risk of cardiac arrythmia among patients starting azithromycin is 0.001%). For this established exposure-outcome relationship, are there risk factors of a patient that is considered to make them more/less likely to develop the outcome.

Acceptance criteria:
I have struggled here - and would love the community input on this topic. Can we define an “acceptance criteria” here - upfront, even before we start building concept sets?

  1. E Clinical Description

For this phenotype - after much deliberation - the panel of clinicians decided to call this “Essential hypertension”" and clarified that it is not secondary hypertension, not malignant hypertension or hypertensive emergency and it is not Treated hypertension.

Essential Hypertension:
Overview: Persons newly (first time) diagnosed with chronic hypertension not explained by secondary/medically correctable causes, identified in an ambulatory/office setting as part of routine (asymptomatic)/incidental primary care visit in persons who have had no significant past medical history. No evidence of end organ damage.
Assessment: Ambulatory/home blood pressure measurements. Screen for secondary causes.
Plan: Treatment is almost always expected to be started AFTER initial diagnosis. Routine follow-up - more frequently initial as treatment is titrated. Less frequent if stable, responsive to treatment and well tolerated.
Prognosis: Variable - chronic disease, no end date.

Similar outcomes:
Should not co-exist: Hyperaldosteronism, coarctation of the aorta, renal artery stenosis, chronic kidney disease, and aortic valve disease

Should not occur with the phenotype:
Patient should not be worked up as an emergency (including inpatient hospital) - this is not a life threatening emergency.
Should not have been previously treated with blood pressure medication - treatment should ideally start AFTER diagnosis
No treatment in past - because not treated hypertension.
Not Malignant hypertension, resistant hypertension, hypertensive encephalopathy

Please read this clinical description together with the clinical notes above

Step 2: Perform a literature review.

The purpose of this section is to understand how others have tackled the problem of phenotyping in their work for observational research. This would be the first time - we start thinking of how to phenotype (ie. build concept set, build cohort definitions).

If you thought of or worked on concept sets/code sets/cohort definitions - prior to this step. STOP - RESET - REDO. :slight_smile:

Ideally - the panel who worked on the clinical description should also have provided the literature - but sometimes that is not the case. Also - this obviously may not be comprehensive - so if others know of a high quality work that needs to be part of this review - please post.

  • Literature review: have others performed epidemiological/observational research-based studies on the outcome of interest. Are there published literature available that have studied the outcome of interest in observational data? Have they described how they developed an operational definition for the outcome of interest?

Capturing some key literature and insights below:

  1. Hypertension. 2015 May;65(5):1002-7.
  • includes malignant hypertension and hypertensive encephalopathy - focused on in patient admission
  • essential hypertension (ICD 9 code: 401.9)


  1. Am J Hypertens. 2017 Jul 1;30(7):700-706.
  1. Am J Cardiol. 2011 Nov 1;108(9):1277-82.
  • focus on hospitalization in hypertensive emergency


  1. BMC Health Serv Res. 2016; 16: 303.

  1. J Stroke Cerebrovasc Dis. 2016 Jul;25(7):1683-1687.

aged 18 years or older who were discharged from an ED with a primary discharge diagnosis of hypertension as defined by ICD-9-CM (International Classification of Diseases, 9th Revision, Clinical
Modification) codes 401-405.

  1. Am J Emerg Med. 2011 Oct;29(8):855-62.
  • presenting to the hospital with acute, severe hypertension and receiving treatment in a nonoperative, critical care setting. BP documented greater than 180 mm Hg systolic and/or greater than 110 mm Hg diastolic

Step 2: Perform a literature review.

I found the literature search pretty interesting - and raised a question/concern:

  • why did i not find a definition for ‘essential hypertension’ that i just use
  • has no one truly studied essential hypertension in observational research
  • what have i done wrong!

Some key insights:

  1. Use of AHRQ CCS - reference to codes here
  2. Use of primary vs secondary diagnosis - something that has popularity (but questionable utility) in US data sources.


My next step was to review the code set here - and review found

I found the review of AHRQ CCS codes interesting because “Hypertension and hypertensive-related conditions complicating pregnancy; childbirth; and the puerperium” was considered part of “Essential Hypertension”.

What does the community think? My position would be:

  • [Include] Yes this possible: Long standing chronic hypertension is diagnosed commonly for the first time (incidentally) during pregnancy related care. So - it would meet our clinical description of essential hypertension.
  • [Exclude] But if a person who is pregnant was found to have eclampsia or pre-eclampsia or related medical condition - exclude.

If you are reading till here - you probably are beginning to relate to a journey

By developing the clinical description - we now have a clearer understanding of what we are phenotyping. We have described a structure on how to develop such a clinical description
a) start with an authoritative sources that is at the right level - make your notes
b) review clinical vignettes - to better understand what it is, what it is not.
c) write clinical description - preferably a panel of clinicians - get consensus. If no consensus - then you cannot phenotype. If consensus - understand what it is, what if present would make it less likely/exclude it
d) develop an acceptance criteria
e) look and learn from others work in observational research - clarify any surprises (e.g. the pregnancy, primary vs secondary)

If you have looked at Altas or talked about codes PRIOR to reaching here - you are NOT following best practice :slight_smile: STOP - RESET - RESTART

1 Like

Now to build a concept set expression - we have a process for that appears to work. I wrote a paper on the Science Of Phenotyping here - that i don’t know if we can/cannot publish. Its part of the OHDSI Phenotype Development and Evaluation workgroup - if you want to contribute/collaborate on that work, please ping me (it describes/defines the phenotype development process). Most of the introduction content in that paper - i consider obsolete (represents our thinking in 2020)- because our thinking has evolved - but there are other sections that are still current.

Some excerpts from that paper

This flow chart is the concept set recommender system (PHOEBE) pioneered by @aostropolets

Step 3: Building Concept Set expression
The process to build concept set expression is described above (its not perfect, and there is an opportunity to document it better) - and was demonstrated by others during the Phenotype Phebruary @Patrick_Ryan and @AzzaShoaibi

But the key ideas are:

  1. Don’t start before you have a finalized clinical description
  2. Create a starter concept set expression
  • Create a lexical search criteria - based on the key words in the clinical description.
  • Import any starter concept id’s (maybe non standard codes) that were recommended/previously used in the work of others,
  • Optimize the concept set expression
  • Review the included concepts
  • Iterate till satisfied
  1. Use PHOEBE - look at recommender - iterate -
  2. Finalize

Few gotchya

  • Because design diagnostics involves decision making based on recommended system – a good practice is to document the key reasons/insights as notes. This note should have sufficient detail to inform a new user the reason as to why a certain concept was included (or excluded) from concept set expression.
  • If a decision has been made to incorporate a non-standard concept set into the concept set expression – then it is urged to report the insights leading to the addition to the OMOP vocabulary team. This reporting would help improve the mapping logic in the OMOP vocabulary for a future version.
  • Research readiness: It is important to remember that at this stage, the resultant concept set expression has undergone a lexical/semantic understanding based design diagnostics. It may not been applied to a cohort definition, and if applied the performance diagnostics of such cohort definitions has not been evaluated. i.e. the cohort definition, that is built using the concept set expression output of design diagnostics has not been reviewed using data diagnostics. It is thus NOT considered research ready cohort definition.

Got an example?

Thanks @Gowtham_Rao for initiating the discussion on hypertension. Glad that we are talking about heart diseases on Valentines day :slight_smile:

One thing I am unclear about in this clinical description (and indeed most others), which I presume would fit under ‘prognosis’, is how to think about potential resolution of a condition, and whether this is expected or observable. In this example, we could consider hypertension as a chronic disease and assume its therefore a condition that continues indefinitely (so cohort end is end of observation period). But, that may be overly simplistic: generally, clinicians will advise lifestyle changes (diet and exercise) and some patients who control blood pressure in this manner may get to a point where they can stop treatment (or potentially never start). Same would go for t2dm, even though we generally think of that as a chronic disease with no end date. The first question in my mind, before looking in the data, is what is the clinical truth of the disease: is it truly chronic and unresolvable, or can it start and stop/ recur? Then, the follow up based on the data is: how do we model this clinical reality, given what is observable? I could imagine that we reach a conclusion that we cant find the proper cohort episodes with what data we have, but i dont want to confuse that with what should be a patients true time varying disease status.

1 Like

Step 3 Building concept set expression

  • using lexical search, concept recommender, my re-reading of clinical description - i ended up with this concept set expression

  • Briefly - the way i got here

  1. I did a lexical search for ‘hypertension’ and picked up

  2. Upon looking at the resolved concept set - i found the concepts who also had parents to Hypertensive crisis, so i decided to exclude the parent.
    Labile essential hypertension
    is mapped to Intermittent hypertension and is usually a type of Secondary hypertension - so based on clinical description - these are not part of the phenotype of interest.

  3. Now to PHOEBE - and notice these recommendations

  • Hypertension in the obstetric context was recommended, and based on clinical description this is to be included. But i couldnt tell if eclampsia and pre-eclampsia are part of it - so after some vocabulary acrobatics I decided to include this because my choice to exclude Hypertensive crisis before along with excluding Pregnancy-induced hypertension would take care of hypertension that maybe considered a complication of pregnancy (rather than complicating pregnancy)
  • Systolic hypertension and Diastolic hypertension came up in PHOEBE. I had to go back to medical text book for this - and finally decided to include them - but evaluate later using Cohort Diagnostics re impact. Is Systolic/Diastolic hypertension a type of essential hypertension - what do you all think?
  • Finally Hypertensive disorder - this was really interesting because its descendants have concept that semantically do not represent the clinical description. But there are READ codes mapped to it --with significant counts as shown here
    - should we include it or not include it? It was ambiguous - so i decided to keep the ambiguity and revisit using Cohort Diagnostics

@Patrick_Ryan - let me change the question from what is a disease to - what is “Health”. WHO’s constitution pre amble is

By that definition - a person with disease may still be “Healthy”. The goal of medical profession is to help person achieve the state of “well-being”.

Now - lets visit your question - when does a disease end. It ends when the person has achieved the state of well-being.

We don’t observe “well being” in our data. We observe presence of/treatment for disease.

So - it is fair to say that

  • if a person is being observed in our data and during that observation ALL disease related care is captured in our data
  • and the person has not sought care (despite being offered/availed that care i.e. has access to medical care)
  • and the access to care is especially evidenced by the person receiving other care (e.g. fracture, or cold, pain in joint)
  • and despite all this the person is not receiving care for the disease we are interested (hypertension)

then i would say that the person although still has the chronic disease - has reached the best state of well being with respect to hypertension (i.e. well controlled or normotensive) - and is now in good health.

Maybe that signals the cohort end date! So how do we operationalize this in our data

  • we need a rubric to say if a person who has access to care, does not seek care for the condition of interest for more than say 12 months (for chronic disease) and say 1 month for acute disease - is now free of that disease.
  • any future occurrence is a recurrence i.e. new cohort start date

This is ofcourse an opinion - and we have to test this. Cohort end date is a complex topic - but an important one for us all to tackle.

The atlas-phenotype.ohdsi.org has many such examples of cohort end date

See for example the Bronchitis cohort definition ATLAS

See the use of event -persistence. All Bronchitis events within the window are collapsed and 14 days offset