An idea has been raised to include a special ‘concept set diagnostic report’ for a cohort definition which shows the set of included concepts across all concept sets in the definition, with a sparkline indicating the prevalence over time in the data. Below is a screenshot of what that might look like:
Few notes on this picture:
- The rows with RC = 0 wouldn’t have a sparkline.
- The domain of the plot would be normalized across all other plots (ie: 3 plots of data from 2004-2007, 2010-20014 and 2005-2011 would define the overall min domain of 2004 and max domain of 2014. This gives us the plots in the example that seem to start or end in the middle of the plot
- The Y axis would be prevalence of records in the given month (RC / # rows for given month or it could be relative to the other values in the result taking the min/max of all RCs and then converting it to a percentage overall).
Some questions:
-
Where would a report like this fit into the existing UI?
-
How do we handle the case where there’s a row with RC = 3.8M and another with RC = 409, where the 3.8M will completely obscure the trend of the RC = 409? Should the Y axis for each sparkline be independent, while the X axis is normalized across?
-
Where should these trends be sourced from? Should a results schema be loaded into the vocabulary datasource, or should it fetch the trend lines using a specific source (RCs are fetched this way currently, but it leads to timeouts under certain circumstances)