OHDSI Home | Forums | Wiki | Github

Tricky phenotyping in ATLAS

I want to report recent ours experience related with phenotyping in ATLAS.

I’m working on developing cohort definition for cancer patients for claim data with @Jaehyeong_Cho .

Kim et al., reported that the most accurate way to define cancer patients in Korean claim data is using two constraints, hospitalization and primary diagnosis.
So we’ve made several cohort definitions in ATLAS
First definition and Second definition.
(Because SynPUF data doesn’t seem to have information for primary diagnosis, you cannot see the result in the public ATLAS.)

In our experience, the number of cohort was almost doubled in second definition, which I didn’t expected. The first definition was much more similar to the known numbers than the second definition.

The reason might be the ‘first time’ constraint in the first cohort definition.

Anyway, this demonstrates again why we need golden standard phenotype library in OHDSI.

The first definition is saying ‘The first time any of the codes in Liver Cancer (C22) was recorded must be between 2008-12-31 and 2013-12-30’. So anyone who had a first diagnosis in their medical history before 2008-12-31 and 2013-12-30 will not be identified.

The second definition is saying ‘Find the earliest diagnosis for any of the codes in Liver Cancer (C22) that occurred between 2008-12-31 and 2013-12-30’. The earliest within the time window may not be the first diagnosis ever in their medical history. The second definition allows more people to come in because if someone who had a first diagnosis in their medical history before 2008-12-31 and 2013-12-30 will be identified if there is another diagnosis that does occur between 2008-12-31 and 2013-12-30.

These are different statements about the types of people you are trying to identify (New Occurrence vs. some earliest occurrence within a time window). It makes me wonder if there is this sort of variance in how you want to describe the population, can there really ever be the ‘one true golden standard’?

Yes @Chris_Knoll You’re right
Still we need a library for phenotyping.

We can validate the phenotype for Korea by comparing knwon incidence or using manual chart… But I don’t know how to set up golden standard of phenotype for both US and Korea

1 Like

I agree fully, and I don’t want my comments to be interpreted as I don’t believe in a phenotype library, only that the ‘golden’-ness of a phenotype will most likely be context-specific.

1 Like

@Chris_Knoll Yes, I understand your concern. it’s hard to say we can build ‘Golden’ phenotype, though we can suggest several ‘silver’ phenotypes.

Besides, what is interesting is that, in medicine, the golden standard does not usually exist. For example, liver biopsy is golden standard for diagnosing hepatic steatosis or fibrosis. Liver biopsy only extract a tiny little mass form whole liver. I’m sure it would be less than 0.1% of whole liver.

1 Like

Friends: This is probably a discussion for @apotvien’s WG debate, but we should research this: How often are phenotypes (golden or other metal) transferable, and to what degree?

@Christian_Reich Yes, you’re right. we’ll see it through various validation processes from multiple institutions.

t