Why Achilles Analysis 106,107,108 only looks the first observation period?

djogopatrao · July 10, 2023, 7:20pm

Hi folks. We’re using CDM v 5.3, atlas 2.10.1 and webapi 2.11.1. I’m not sure how to check Achilles version but anyway.

I was inspecting why in our database some of the observation period graphs were so flat (see attached image). In particular, the Observation Length graph shows that 95% of patients in our hospital have 0-30 days on their first observation period.

These graphs are generated by a series of queries that end up getting data from achilles analysis 106, 107 and 108. And all those queries retrieve only the first observation period from each patient (see the row_number() columns and where clause )

I’m not sure what those analysis intend to show. We expect that, in a hospital, the minority of first-time patients will be both an inpatient AND have a prolonged 30+ days stay. Also, from the graph captions, we expected to see the sum of all observation periods.

So my questions are:

are those analysis supposed to consider only the first observation period? if so, what are they showing?

Thanks a lot!

-- 106	Length of observation (days) of first observation period by gender
--HINT DISTRIBUTE_ON_KEY(gender_concept_id)
CREATE TABLE sandbox_gov_hdsb_omop_tmp.jeablzyvrawData_106
 STORED AS PARQUET AS
SELECT
p.gender_concept_id, op.count_value
FROM
(
  select person_id, DATEDIFF(CASE TYPEOF(op.observation_period_end_date ) WHEN 'TIMESTAMP' THEN CAST(op.observation_period_end_date  AS TIMESTAMP) ELSE TO_UTC_TIMESTAMP(CONCAT_WS('-', SUBSTR(CAST(op.observation_period_end_date  AS STRING), 1, 4), SUBSTR(CAST(op.observation_period_end_date  AS STRING), 5, 2), SUBSTR(CAST(op.observation_period_end_date  AS STRING), 7, 2)), 'UTC') END, CASE TYPEOF(op.observation_period_start_date ) WHEN 'TIMESTAMP' THEN CAST(op.observation_period_start_date  AS TIMESTAMP) ELSE TO_UTC_TIMESTAMP(CONCAT_WS('-', SUBSTR(CAST(op.observation_period_start_date  AS STRING), 1, 4), SUBSTR(CAST(op.observation_period_start_date  AS STRING), 5, 2), SUBSTR(CAST(op.observation_period_start_date  AS STRING), 7, 2)), 'UTC') END) as count_value,
    ROW_NUMBER() over (PARTITION by op.person_id order by op.observation_period_start_date asc) as rn
  from sandbox_gov_hdsb_omop_dev.observation_period op
) op
JOIN sandbox_gov_hdsb_omop_dev.person p on op.person_id = p.person_id
where op.rn = 1
;

MaximMoinat · July 17, 2023, 6:16pm

The analysis indeed shows the distribution of the length (in days) of the first observation period. Note here that an observation period is not the same as an encounter/visit. This is a great article that elaborates for hospital data how to derive the observation period: Observation Period Considerations for EHR Data

Look into how observation periods are defined for your data. Another analysis in Achilles shows the number of observations periods person. That might be interesting as well.

djogopatrao · July 18, 2023, 4:46pm

Hi Maxim, Thanks for your reply and for the article.

Our algorithm merges continuous or intersecting visits into larger periods, and it is warranted that no clinical event is falling outside one of those periods. However, there is a lot of sparse outpatient visits what yielded a great number of 1 day long periods of observation.

The rules stated in the article certainly would mitigate this problem by joining a lot of those sparse visits together.

I’m not sure if I get worried or not about overestimating the observation periods length; in one hand, even patients which only visited our hospital once will have 60 days of observation period, on the other, this table observation_period doesn’t hold clinical data per se, by its defition [1] it seems like it is derived from other data.

So… what is it good for? =) I’m guessing it may help filter out periods of time and speed up some other queries.

Thanks!!

1: OMOP CDM v5.4