FeatureExtraction 2.0

Gowtham_Rao · July 2, 2018, 10:19am

Thank you @schuemie that sounds like a good solution, especially row_number(order by subject_id, cohort_start_date)

Shouldn’t we make the row_number() default? It seems like this is a portal gotcha situation?

schuemie · July 2, 2018, 11:49am

Default how? The cohort table is the input to FeatureExtraction. Whether or not it contains a row_id field (created using ROW_NUMER) is not within FeatureExtraction’s control.

Gowtham_Rao · July 2, 2018, 12:28pm

Thank you @schuemie sounds like we have a known problem when there are more then one record per subject_id. The default behavior of FF is to use subject_id as row_id. So, given a situation like this

We have a problem using FF default behavior of rowId = ‘subject_id’, because we are more likely to have difficulty differentiating between features generated for the same subject_id with different cohort_start_date 5/1/2016 and 2/15/2017.

In this case, we need to use a structure something like this where we need to create a new column that uniquely identifies every row record within the same cohort_definition_id

where rowId = ‘cohort_row_id’ . Current standard tools don’t do this by default, and cohort table does not have cohort_row_id. So we have to do it outside - by creating a new rowId field by using row_number() (partition by cohort_definition_id order by subject_id, cohort_start_date)

Eldar · July 5, 2018, 10:39pm

@schuemie , can you please help me with the logic of creating custom covariates?
Looking the vignette I found the next:

cohort_definition_id, A key to link to the cohort table. Note that this will be come the covariate
ID, so you should take care that these IDs do not overlap with IDs of other covariate builders that may
be used as well

I actually want to take care of assigning cohort_definition_id in order not to overlap with other default covariates I’m going to use. It seems like I can’t just use cohort_definition_id’s from Atlas and need to reassign it as well.
Is there some range of numbers which is not used for default covariate_id’s?
I tried to obtain it via reverse engineering, but failed =(