OHDSI Home | Forums | Wiki | Github

Readmission detection

Moved to researcher category

Hi @Michael_Shamberger, welcome to the OHDSI community! I hadn’t heard of
the Surgeon Scorecard before, so thanks for that! Here’s a link for those
who may be equally niave as me: https://projects.propublica.org/surgeons/.

In the article from that page that talks about their methodology (
https://www.propublica.org/article/surgeon-level-risk-short-methodology),
it seems you need to do a couple different manuveurs to find a qualifying
inpatient stay, which is then followed by a qualifying ‘readmission’. You
need to define the procedure concepts that make up the 8 elective surgeries
(knee replacements, hip replacements, three types of spinal fusions, one in
the neck and two in the lower back, gall bladder removals, prostate
removals, and prostate resections), then you need to make sure these
surgeries took place during an inpatient stay. You need to exclude those
surgeries if it was preceded by an ER visit. To make sure you are looking
at a new surgery, and not just another billed code or revision, you might
also want logic that says “can’t have a surgery in last 30d”.

Amongst these qualifying surgeries, you are then looking for new inpatient
admissions that start within 30 days of the surgery, and who have a
qualifying concept for one of the complications. In the article, they
cite: “problems like infections, blood clots, uncontrolled bleeding and
misaligned orthopedic devices”. The first 3 are likely going to be
recorded as conditions. However, ‘misaligned devices’ or more
specifically, codes that reflect revisions to surgery, those are likely to
be procedures. The article also specifically cites that they looked at
‘primary discharge’ diagnosis fields, so you’ll want to limit on that.
Its not clear how the propublica people did it, But for the conditons, you
probably want to think about whether those codes need to be ‘new’, meaning
have not occurred in the past and being suggestive of pre-existing
conditions.

So all told, what domains do you need to use to replicate this type of
analysis? By my reading, you need to use the VISIT_OCCURRENCE,
CONDITION_OCCURRENCE, and PROCEDURE_OCCURRENCE tables.

Now the good news is, insofaras I could glean it from the article, figuring
out how many qualifying surgeries you have and what proportion of them have
a complication, that’s a breeze using the OMOP CDM and the OHDSI tech
stack. I created the following cohort definition out on the OHDSI public
install of ATLAS for you:
http://www.ohdsi.org/web/atlas/#/cohortdefinition/19101. I didn’t
populate the conceptset expressions, since I don’t know the particulars,
but you can see all the temporal logic and what elements are required. And
in case you don’t yet have ATLAS installed against your CDM instance, you
could always finish the conceptsets on the OHDSI site, then just copy/paste
the SQL code into your local environment, and away you go!

Happy hacking…

1 Like

Thank you for the reply. It should get me a lot further and I will study the tables you mentioned. I also have to study the features of Atlas tool. Now that I am looking at the Atlas cohort link you sent me, it could be that Atlas would be a front end for the tool that I am developing. I am also interested in the scalability of Atlas. For example, it could likely handle a million patients but what about 10 or 100 million?

One goal of the project I am doing is building a code that can scale across large data sizes and multiple machines. The sparc framework is able to do this (http://spark.apache.org/) . It can use the same sql logic that is used in the current OMOP tools so no need to reinvent those. I am more comfortable to code in python but their also exists R extensions for sparc.

The pro publica group did publish the icd9 codes that they used to detect the surgical procedures as well as the ones that were designated as complications. That is here: https://static.propublica.org/projects/patient-safety/methodology/surgeon-level-risk-appendices.pdf. They have a detailed methodology paper here: https://static.propublica.org/projects/patient-safety/methodology/surgeon-level-risk-methodology.pdf. Though some aspects are not completely defined like their patient risk adjustment algorithms. In any case, the first version of the code is not going to perform any risk adjustment. Some of the design choices made by propublica in their analysis would be turned into flags that can be turned on/off in the property file.

The tool I am developing would allow a semi-technical user to design a patient cohort/select procedures of interest and find readmissions based on editing property files. The run-time environment for a single machine is not difficult to setup but running across multiple machines would take some technical skill.

year_of_birth_min = 1900
year_of_birth_max = 1990
events_min = 1
events_max =
filter_dead = False
filter_alive = False
filter_male = False
filter_female = False
filter_care_sites = 2345,78345
include_care_sites =
filter_events_not_in_cohort = False
readmission_time = 30
readmission_code_file = readmission.properties
diagnostic_code_file = diagnosis.properties
comorbidies_code_file = comorbidies.properties
icd_diagnosis=icd9
icd_readmission=icd9

I can publish my initial code in a few days and hope to get some design review.

Example icd9 codes for a procedure:

Total knee replacement
81.54 = 71536,71596,71516,71696,7140,71589,71526

Example icd9 codes to check for readmission for this procedure:

Total Knee Replacement
81.54 = 99859,99666,6826,41519,0389,99677,99812,41511,00845,99832,9972,2851,45341,99831,45342,33818,99644,45340,99642,99667,71916,99811,78060,71946,9974,9971,2800,71906,99889,82123,43411,78062,82021,99739,71106,99830,03849,8220,82101,99647,72981,45829,99883,8208,82009,99678,92411,96509,82120,0383,99813,0380,9975,71856,48283,7295,99851,27650,72766,99649,7907,44422,99643,82300,4829,03843,8363,0388,9642,99641,99833,71956,03812,03840,99799,48242,99702,03819,4536,9654,03811,45119,8910,99939,99931,71846,99662,9658,92400,8912,9986,4512,4538,71945,99669,45389,8911,2875,71836,7179,48241,99679,73342,03810,82302,44421,99646,4519,5121,99591,83650,99989,99670,99659,99985,45383,45111,28749,8221,99640,44481,45352,99709,0382,2874,71789,99791,9580,7388,71908,71886,99660,9581

I have put out the initial Surgeon Scorecard Application in a github repository.

It can calculate the scorecard for 8 different procedures on the 2.4 million patients with 289 million condition_occurrence and 279 million procedure_occurrence in about 80 minutes. This is on a single 8 core server with 64GB ram. Run time increases with less then 32GB ram as it needs to put data to disk.

One question I had about is about “‘primary discharge’ diagnosis fields”. How do I found that in OMOP data?

You can use the CONDITION_TYPE_CONCEPT_ID to delineate ‘primary’ diagnosis
codes.

Thanks again. I calculated the counts of the CONDITION_TYPE_CONCEPT_ID and the PROCEDURE_TYPE_CONCEPT_ID for the full SynPuf data set.

CONDITION_TYPE_CONCEPT_ID,COUNT,DESCRIPTION
38000230,280864910,Outpatient header - 1st position
38000200,8317475,Inpatient header - 1st position

PROCEDURE_TYPE_CONCEPT_ID,COUNT,DESCRIPTION
38000269,275176949,Outpatient header - 1st position
38000251,3592580,Inpatient header - 1st position

The existing SynPuf ETL does not convert any data as the inpatient Primary diagnosis field which I think would be 38000250. It only converts data to 1st position. I wonder if some information is lost in translation here? I think I will need to proceed assuming 1st position is same as primary.

t