I would like to bring this topic to the community to discuss.
My recent OMOP conversion source data is a data warehouse from one of the largest EHR vendors in US. Its data has some care site attributes such as Hospital bed size range (200-299, 300-499, 500+ etc.), Teaching hospital indicator, Rural-urban indicator and etc. I have asked them how are these data element used and they did provide me with use cases. Also I think attributes describing care site, location, and provider can be very useful in healthcare research. Some of the attribute examples are listed below and lots of them have been used in research literature:
PROVIDER
-
Years in practice
-
Years practicing in current setting
-
Doctor’s rating (WebMD, Healthgrades etc.)
-
Language spoken
-
Job tile (director, department chairman etc.)
-
NPI equivalent in other countries
CARE_SITE
-
Hospital bed size range (200-299, 300-499, 500+ etc.)
-
Teaching hospital indicator
-
Rural-urban indicator
-
Practice type (solo practice, episodic care practice/walk-in clinic etc.)
-
Total number of clinical and administrative staff
-
Number of patient visits per week in primary practice
-
Focused practice scope (yes/no)
LOCATION
-
Rurality (metropolitan, suburban, rural etc.)
-
Accessibility to health care
-
Population density
-
Air Quality Index (AQI)
-
Average temperature
-
Annual days of sunlight exposure (ASE)
-
Region
-
International address format (District, Province, Region etc.)
-
Latitude
-
Longitude
-
Altitude (height above sea level)
And the list goes on and on. Currently our person centric OMOP CDM model does not support including these data elements. Now one choice to expand current dimension table structure by add all these attributes into their respective tables, i.e., Provider, Care_site and Location table. But that is disruptive and also not economical since in many cases, source data do not provide these information. So I am proposing to add a row based new table, called Dimension_Attribute (or any name you want to call it) as a comprehensive way to include these attributes in case source data does provide such information. It also preserves the original dimension table structure and is backward compatible. The table will have following columns:
-
Domain_id - This will have one of the 3 values: Provider, Care_Site or Location
-
Dimension_id - This is the same id number as in their respective dimension tables, i.e., Provider_id, Care_Site_id or Location_id
-
Attribute_concept_id - Cconcept_id for the attribute, e.g., 55556666 for Air Quality Index (AQI)
-
Attribute_source_value - Source value for the attribute, e.g., Air Quality Index
-
Value_concept_id – Concept_id for the value of the attribute, e.g., 22223333 for AQI 0 -50; 33334444 for AQI 51-100; 44445555 for AQI 201-300 etc.
-
Value_source_value – Source data value for the attribute, e…g, AQI 35, AQI 95, AQI 268 etc.
-
Attribute_dt – This is the date when attribute value was reported. It is optional.
For a detailed table structure and examples, please see the spreadsheet attached.
Dimension attribute.xlsx (16.7 KB)
Since the table is row-based and not column based, it retains the capability to add as many attributes as needed without disrupting table structure. As a general rule, the table should mostly contains value provided by source data asset, and not derived value. However, if absolutely needed, such as in the case of spatial epidemiology (Geospaital functionality in Atlas: integration of AEGIS · Issue #649 · OHDSI/WebAPI · GitHub), this table can also accommodate derived value as well.
I am not sure if this has been proposed before. If yes, I apologize for not acknowledging you but I am not able to find it. Please feel free to give your opinion on this.
Thanks,