OHDSI Home | Forums | Wiki | Github

What is Table 1, and what should go into it?

This is the continuation of a debated that started in another Forum post about categorization of diseases in a population.

What is Table 1? That is actually not an easy question. The various guidelines STROBE, CONSORT and TREND state, more or less in unison, that it should contain “baseline demographic and clinical characteristics” of participants in each study and “information on potential confounders”. (It also mentions exposure, missing data and follow-up time, but that is outside this discussion).

There even is, or was, a Table 1 Project at Duke.

So, let’s talk about “clinical characteristics”. This could be one of two things:

  1. Certain specific diseases that are relevant to the question, including the confounders.
  2. General categories summarizing co-morbidities or medical history.

@Patrick_Ryan’s list, which is identical to the list of the Legend paper, seems to be a mixture of both: There are very specific conditions such as COPD, and generic categories like “Malignant neoplastic disease”. Most of them are somewhere between.

I would claim the more general or category-like the conditions are, the more you could use a mechanism like the summary script discussed in that other Forum discussion. The more fine-grained they are the more it this is the domain of @Gowtham_Rao’s phenotypes. A library of these seems a good thing, so that wheels don’t get reinvented all the time.

However, I am a little uneasy about two aspects of that:

  • How do we define “relevant” conditions? Studies, including Legend, do not explain how they come to that choice. Why, for example, are “Malignant neoplasm of anorectum” or “Primary malignant neoplasm of prostate” relevant to antihypertensive drugs, but skin and kidney cancer, which are more common, are not? Usually, the OHDSI ideology tells us to stay away from hand-picked expert choices and recommends a systematic approach. Technically, it would be easy to report on thousands of conditions, but who is going to phenotype them all, and how should they be reported meaningfully? The Charybdis study solved that problem by reporting simply on Condition concepts, dumping incidence numbers in the thousands.

  • How can we standardize conditions if they are specific to a study? Is that Patrick list going to work for all studies? In, say, ophthalmology, obstetrics or immunology?

And then they are the “potential confounders”? Do we have an idea how to tackle that?

1 Like

While the two things might inform each other, I would keep them distinct. Creating a grouping that organizes all conditions into reasonable categories ensures that no diseases are missed.

Table 1 is not about showing every condition, but what are most people interested in. Demographics, common serious illnesses, other common confounders (e.g., perhaps smoking if we had it). But no need to keep track of a broken finger for Table 1.

Usually Table 1 is steered by the hypothesis at hand, but much does pop up in common.

I would want conditions of sufficient prevalence, sufficient cross-disease impact, and sufficient severity. So sickle cell is too rare. The effects of drinking coffee are too mild. Physical injuries to limbs are too local. Diabetes, HTN, CKD seem like good candidates.


Is anyone interested in including procedures in Table 1?

Makes total sense. And having a library of those is nice. But we should make sure people understand that their table 1 should also contain study-specific “interesting” conditions, and it won’t be enough to just run the standard ones Patrick listed. Because “sickle cell disease” could be very relevant if you, say, research malaria. While HTN may not be.

Whom are you asking? The community? Or folks doing specific research, where procedures are relevant?

Agreed with you, George. Lots of common threads.

Also agree with this.

With regards with what to characterize, there are certainly hypothesis that require characterizing the baseline of more than conditions.

Yes, but it’s study-specific. For example, I’m working on a study evaluating incidence of persistent opioid use following specific procedures (total hip, knee and shoulder arthroplasty) in opioid naïve individuals. In that study, we’re making a Table 1 that summarizes the surgery type across our initial cohorts of interest. We also are interested in summarizing specific types of prior drug exposures.

Rather than boil the ocean, the current FeatureExtraction way of choosing from the domains of interest and then producing covariates is actually quite handy. There’s lots of reasons you may want to summarize prior other things besides conditions (e.g. prior drugs, prior labs, baseline demographics, comorbidity indices, prior procedures, etc).

@Daniel_Prieto, maybe we should show @Christian_Reich Marti’s package? :wink: (I can’t seem to find him to tag him on this forum post.)

Oh forgot to comment on this…

Yes, BUT we had an unforeseen issue in Charybdis where we didn’t make enough intersection of covariates. Outside of conditions, we had a missing part of the venn diagram.

We cut everything from the initial target cohorts down (e.g. “People had COVID”). Problem was that when people wanted to know “All people using XYZ [e.g. drug, procedure, etc]” and the stratify by the target cohort definition, we didn’t have that number. So if we wanted to know people who get a specific drug and then had COVID, we didn’t have that count.

It’s stupid but man, hindsight is 20/20. I would love a time machine to go back and change that so we could have better comparisons for Table 1 baseline characteristics.

Invoking form follows function: what is the purpose of table 1? Is it to describe the patients in the study in a standard way such that you can look at the same set of characteristics between studies to understand similarities between populations, or is it to be study-specific and describe the “most important” characteristics in the context of the given study?

In the former case, you have a standardize report that can ignore study nuance and you can recognize/understand the information once and it will apply to all OHDSI studies. In the later case, you avoid ‘worthless’ characteristics that don’t hold any meaning in the given study context.

But it comes back to what table 1 is trying to accomplish. If we can agree on the function, the form will present itself.

What is Table 1:

Table 1: I think it should be a succinct table that is easily interpretable. It shouldn’t be a dump of all the data and is a ‘parsimonious’ version. The inter reader variation in interpretation should be minimal.

Standard Table 1: I think we can do a ‘standard table 1’ . We should define it and I think the current table 1 output of Feature Extraction is good, but we have identified an opportunity to improve it as described here. This table 1, could be considered the OHDSI default and I think its reasonable to make it modifiable based on study specific needs.

What should go into it:

A. Counting up codes:

  1. Counting up codes/conceptIds: e.g. count/percent of conceptId 320128 (essential hypertension). It could be a count/percent report of occurrence of conceptId in some time window in relation to cohortStartDate (e.g. shortTerm 30 days,long term 365 days, all time). There are interpretation problems with this approach - a) what is the conceptId (i.e. all we have is a phrase called ‘essential hypertension’ which is probably fine for essential hypertension but what about ‘urticaria and/or angioedema’) is occurrence of a conceptId once sufficient to declare the person has the clinical idea. b) what about use of descendants of this concept id or parent conceptId, c) what about orphan concepts that are not captured in the ontology grouping, d) how do we interpret the count (is it an estimate of the lower bound because it does not include descendants?). From technical perspective, this is done all the time using FeatureExtraction and is being used in our output. Its ready to go.

  2. Counting up group of codes/concept set/ontology grouper: Maybe better than above because we are including descendants. We can even support custom grouping like charlson co-morbididity grouping of codes. There are many proposed ontology groupers, some supported. Overcomes issues in 1 such as use of descendants, parent, orphan; it probably represents a more precise numerical estimate. Feature Extraction supports many groupers. The challenges are a) having an understanding of the grouping and what codes are included in it and why, i.e. what is the conceptSet/grouper representing and why was it made, if it is custom then how/who evaluated/validated it and what are the issues found that can be source of errors.

B. Counting up Cohort Definitions
I have a dog in this fight as the person responsible for the OHDSI phenotype library. My vision is that we have a R package with OHDSI peer reviewed cohort definitions that are recommended for use in OHDSI studies and are part of standard OHDSI stack. We have that R Package and it is a HADES certified software here , we also have an Atlas version here atlas-phenotype.ohdsi.org . Both of these stacks are now mature and i plan to release and publicize them. Regarding peer review - we have done that thru the OHDSI Phenotype Development and Evaluation Workgroup, and we are on track to do peer review of a large volume of cohorts. The tool stack is ready to use it as described here https://github.com/OHDSI/FeatureExtraction/issues/169 . There are more than 100 cohort definitions to choose from as listed here Cohort Definitions in OHDSI Phenotype Library • PhenotypeLibrary (note - peer review is not complete yet)

The advantages of using the validated cohort definitions is we over come the issue of not having shared understanding. Because this is peer reviewed process - we will have a documented shared understanding of what the target is that is referenceable, trackable and we have attempted to understand known measurement errors over a network of datasources. It makes the results more trusable.

We have attempted to list out the cohort/phenotypes that can go into standard table 1.

The phenotype library project is equivalent to boiling the ocean - but thats what we do in OHDSI. We boil the ocean and act as change agents. The water in the ocean is not boiling yet, but it is hot. I would say - we can get it to boiling point by focusing our energy on using cohort definitions/phenotypes. :slight_smile:

FYI - i am hoping to release a version of OHDSI Phenotype Library by the end of this year.