OHDSI Home | Forums | Wiki | Github

CohortDiagnostics: What is the purpose of JSON / can one avoid providing it?

Hello! I’m interested in setting up CohortDiagnostic for cohort that I’ve created using “hand written” SQL, i.e. no ATLAS.

But it seems as if I’m not allowed to, CohortDiagnostics seems to want a JSON-file. I don’t understand why that’s needed, and would like to know if there’s some workaround, for instance an “convertSQLtoJSON”-function?

That’s a good question.

Starting version 3, cohort diagnostics does NOT generate cohorts. So it does not need it for that purpose.

Cohort diagnostics has diagnostics related to concept set that requires Json. The incidence rate computation requires the Json. Both require SQL that is generated using Circe.

If you are willing to turn those two diagnostics to off, then you can provide any"dummy" Json and you should get the other diagnostics.

I have not tested this out. Please let me know your experience

Thank you very much for answering, most helpful.

In terms of dummy-JSON, we actually put together some JSON-file manually which at least makes it possible to run CohortDiagnostics, but we felt uncertain how it’s being used in relation to the SQL, does the JSON affect anything, in case we messed up constructing it.

Is it correct to say that as long as we’re turning off the incidence and concept set-tabs, we can provide whatever JSON that makes CohortDiagnostics run, and the output will be based on the SQL and not the JSON?

Perfectly OK to advice that we should go through the code and turn off incidence and concept set-calculations as well, just asking in case we could save ourselves that little endeavor.

Its not a simple answer - but in theory you can do what you are trying to do.

Note: We required CohortDiagnostics to take as the input to be an object called ‘cohortDefinitionSet’. This object is defined by the OHDSI/CohortGenerator package. In addition, we require that object to have the field json, cohortId, cohortName and sql. The reason is cohortJson, as generated by OHDSI circe library, is parsed by CohortDiagnostics and its companion DiagnosticsExplorer shiny app.

The check is performed here.

We pretty much export the content of this object, including JSON, as output - here https://github.com/OHDSI/CohortDiagnostics/blob/5f4d80e9f4210ffaf2ac7d4ab2b26102e4987c58/R/RunDiagnostics.R#L435

So you will have to skip all these processes, and provide a dummy json to avoid error.

1 Like

@OskarGauffin you are welcome to make a proposal for this functionality here Issues · OHDSI/CohortDiagnostics · GitHub

I think it would be useful in many scenarios. For example - it is possible that we take as an input an instantiated cohort table i.e. a table with cohort_definition_id, subject_id, cohort_start_date, cohort_end_date. Can we just point to that cohortTable and runCohortDiagnostics?

I think that would be valuable - as we can get a dashboard (diagnosticsExplorer Shiny app) that

  • provides cohort counts
  • cohort overlap
  • visit context
  • cohort characterization/temporal characterization including cohort as features which is new in version 3
  • cohort time series also new in version 3

@jpegilbert is the new maintainer of CohortDiagnostics. So he will have to consider this issue.

The CohortGenerator package requires JSON for several reasons. Firstly the cohort will have additional meta-data such as inclusion rules that are used to generate statistics and other base information about a Cohort. The output exploration in shiny also currently requires the definition.

As you mentioned before, it is possible to place “Dummy JSON” in the cohort definition set paramter passed to CohortDiagnostics. However, though this is possible the package has never been designed to run with custom SQL because this isn’t seen to be good practice in phenotyping well defined, reusable cohorts.

Alternative to ATLAS, the Capr package can be used to generate cohort definitions in R if you don’t wish to use atlas but wish to create cohorts that conform to the OHDSI json standard, and can be easily exported as OHDSI SQL. This should be useful if the reason you’re using custom SQL is for templating purposes, for example.

Furthermore - If you don’t require the full dashboard for information regarding a Cohort and just want quick characterization results it is possible (and significantly faster) to run FeatureExtraction.

@jpegilbert I think a use case we should consider supporting is “given an instantiated cohort table with the structure cohort_id, subject_id, cohort_start_date, cohort_end_date as the input provide diagnostics on it.” From technical perspective, we can, even in the absence of cohort json and sql, still do

  • visit context
  • index event breakdown
  • cohort counts
  • all the characterization that relies on feature extraction
  • feature cohort characterization that relies on cohort relationship
  • cohort time series
  • cohort overlap

these are valuable diagnostics by themselves. Most projects that have scaled up in OHDSI have used template based cohort definitions. I also think there is a use case where we can use the previously generated cohorts as standard feature cohorts.

I am supportive of a technical solution for this. A simple way to do this is allow for cohortDefinitionSet object to have empty cell for JSON and potentially sql. For any cohort that has an empty for these cells - we skip diagnostics on concept set, incidence rate, and cohort generation. We just put this logic in the executeDiagnostics segment

t