OHDSI Home | Forums | Wiki | Github

Proposal to add a new table - Dimension Attribute table into CDM

(Qi Yang) #1

I would like to bring this topic to the community to discuss.

My recent OMOP conversion source data is a data warehouse from one of the largest EHR vendors in US. Its data has some care site attributes such as Hospital bed size range (200-299, 300-499, 500+ etc.), Teaching hospital indicator, Rural-urban indicator and etc. I have asked them how are these data element used and they did provide me with use cases. Also I think attributes describing care site, location, and provider can be very useful in healthcare research. Some of the attribute examples are listed below and lots of them have been used in research literature:


  • Years in practice

  • Years practicing in current setting

  • Doctor’s rating (WebMD, Healthgrades etc.)

  • Language spoken

  • Job tile (director, department chairman etc.)

  • NPI equivalent in other countries


  • Hospital bed size range (200-299, 300-499, 500+ etc.)

  • Teaching hospital indicator

  • Rural-urban indicator

  • Practice type (solo practice, episodic care practice/walk-in clinic etc.)

  • Total number of clinical and administrative staff

  • Number of patient visits per week in primary practice

  • Focused practice scope (yes/no)


  • Rurality (metropolitan, suburban, rural etc.)

  • Accessibility to health care

  • Population density

  • Air Quality Index (AQI)

  • Average temperature

  • Annual days of sunlight exposure (ASE)

  • Region

  • International address format (District, Province, Region etc.)

  • Latitude

  • Longitude

  • Altitude (height above sea level)

And the list goes on and on. Currently our person centric OMOP CDM model does not support including these data elements. Now one choice to expand current dimension table structure by add all these attributes into their respective tables, i.e., Provider, Care_site and Location table. But that is disruptive and also not economical since in many cases, source data do not provide these information. So I am proposing to add a row based new table, called Dimension_Attribute (or any name you want to call it) as a comprehensive way to include these attributes in case source data does provide such information. It also preserves the original dimension table structure and is backward compatible. The table will have following columns:

  • Domain_id - This will have one of the 3 values: Provider, Care_Site or Location

  • Dimension_id - This is the same id number as in their respective dimension tables, i.e., Provider_id, Care_Site_id or Location_id

  • Attribute_concept_id - Cconcept_id for the attribute, e.g., 55556666 for Air Quality Index (AQI)

  • Attribute_source_value - Source value for the attribute, e.g., Air Quality Index

  • Value_concept_id – Concept_id for the value of the attribute, e.g., 22223333 for AQI 0 -50; 33334444 for AQI 51-100; 44445555 for AQI 201-300 etc.

  • Value_source_value – Source data value for the attribute, e…g, AQI 35, AQI 95, AQI 268 etc.

  • Attribute_dt – This is the date when attribute value was reported. It is optional.

For a detailed table structure and examples, please see the spreadsheet attached.

Dimension attribute.xlsx (16.7 KB)

Since the table is row-based and not column based, it retains the capability to add as many attributes as needed without disrupting table structure. As a general rule, the table should mostly contains value provided by source data asset, and not derived value. However, if absolutely needed, such as in the case of spatial epidemiology (https://github.com/OHDSI/WebAPI/issues/649#issuecomment-440757591), this table can also accommodate derived value as well.

I am not sure if this has been proposed before. If yes, I apologize for not acknowledging you but I am not able to find it. Please feel free to give your opinion on this.


How to load hospital attributes into CDM?
(Christian Reich) #2


Good stuff. We should think about these. But let’s tidy things up a little bit.

Can you share, please?

Obviously, in longitudinal data this won’t work, because it will change. Every year it will change by a year. So, if anything we would have date_of_graduation or something.

Not sure. Those tend to be the things we call “evidence”, and we would want to produce. What WebMD etc. have is called “hearsay” or “rumor”.

Can you conceptualize that? Do you have a good list?

Yeah, this is a bad term. What would you call it?

Sounds like something that belongs to Location. Plus, it’s not data, it’s reference. Because it doesn’t change in the time frames we are talking here.

That’s Visit in our lingo now. Used to be Place of Service. And it is not an attribute of the Care Site, because one Care Site can have many different constellations.

Sounds like a read-out, not reference data.

Not sure what that is. Focused on what? Opposed to what? “All over the place”?

Not sure what that is, and probably read-out, rather than input.

Not sure if that has anything to do with longitudinal patient data.

What’s that?

We got that.

Again, not sure belongs here.

Well, bring it on. We tend to not like EAV structures like that. Instead, you might just make a proposal to add:


  • valid_start_date
  • job_title_concept_id


  • size_concept_id
  • teaching_indicator


  • rurality_concept_id