OHDSI Home | Forums | Wiki | Github

ReproducibilityRayJAMA2021 - Column `stratumId` not found in `.data`

Hello, I am trying to run the R package for the ReproducibilityRayJAMA2021 study on my organization’s CDM but I am getting an error that stratumId is not being created when producing data sets for cohorts.

My log and the errorReportR.txt file the issue is producing:
errorReportR.txt (1.2 KB)
log.txt (1.2 MB)

This similar post here makes me suspect that one of the cohorts being produced has 0 people in it and that is causing the no stratumId. Unfortunately, I have not been able to confirm that with either the CohortCount.csv or analysisSummary.csv shared below (output as csv files but formatted to .txt files for uploading).
analysisSummary.txt (205.0 KB)
CohortCounts.txt (3.5 KB)

@mdlavallee92 we have met virtually to troubleshoot some issues this past fall. The issues we were having at the time related to a server timeout issue where the CDM was being hosted (Azure), I am now having a lot more success with our new CDM in Snowflake. I’m happy to meet virtually again if you prefer but I thought this post might also be helpful for people of the forum in the future.

Thanks,

Andrew

Hi @AndrewNute thanks for the question. Looking at the logs the error comes in at the computeCovariateBalance step where it is trying to subset the dataframe for the stratumId. The reason ths is returning an error is because the dataframe piped into the function has no data.

Scrolling further up on the log, notice that when you are trying to match or stratify the population by the propensity scores, many of the population sizes are zero. This is likely why it is erroring out in the analysis. Despite there being persons in the cohort, there is a significant amount of attrition leading to an insufficient population to use in modelling. The analysis summary confirms this since all your results are 0. I dont think this is a issue with the software, it may be that the data is not suitable for the analysis. I am a bit surprised though that you produced zero results for some of the outcomes in the study, but again it could be the harshness of the attrition meaning the matched or stratified population is insignificant. Maybe @agolozar can add more insight?

Happy to chat more off the forums if you would like.

t