Resources on SQL Query Standards for OMOP Compliance

Hi everyone,

I am currently working on a project focused on improving the explainability of AI-generated SQL queries. One approach we are exploring is the development of a sanitiser/validator that reviews LLM-generated SQL and checks it against a set of rules to ensure the query complies with OMOP conventions.

For example, the sanitiser might validate that:

  • tables referenced belong to the OMOP CDM schema.
  • clinical concepts are identified using concept_id rather than string matching.
  • standard concepts are used, or that non-standard concepts are appropriately mapped to standard concepts.

I am looking for documentation or resources that describe what makes a SQL query compliant with OMOP standards, particularly anything that formalises best practice or defines rules that could be used in an automated validator.

Does anyone know of existing resources or have suggestions for systematically defining compliance rules for OMOP SQL queries?

Many thanks in advance.
Shihao

@sshenzha:

I don’t think there is no such a thing as a valid SQL query. Apart from the fact that OMOP does not specify any database flavor or version, whether or not your query is valid depends on your use case. However, in Github we have a repo with generally useful OMOP queries.

Also, your vibe coder should be able to take our OMOP CDM DDLs and the model description, and create valid SQL. I don’t know if anybody has tried that. I haven’t. But I can imagine that the problem is that many of the queries rely on the inner logic of the OHDSI Standardized Vocabularies.

Let us know what you find out. This is cool stuff.