I want to push this conversation forward with a focus on the potential for LLMs to help with integration across the major collaboration platforms used by the OHDSI community. The need for this integration and for authoritative summaries of complex conversations grows more urgent with the size of the OHDSI community and the duration of if its conversations. Without some solution I think we are at increasing risk of misunderstanding and collaboration that is ineffective because of parallel, diverging, and disorganized efforts related to the same topics.
The focus on copilot/openAI in the example below is based on convenience and is not an endorsement for this specific LLM summarization approach over other options.
When using Edge as a browser, you can give prompts to summarize OHDSI Forum discussions using Copilot.
For example,
this prompt:
Summarize the discussion on this webpage: Blood culture result in measurement or observation table and vocabulary to be used
generated this summary:
The discussion on the OHDSI forum revolves around the best way to handle blood culture results within the OMOP Common Data Model (CDM). Here are the key points:
1. Main Issue: Users are debating whether blood culture results should be stored in the Measurement or Observation table. The challenge is that the vocabulary used for the Measurement table (LOINC) is incomplete, while the Observation table (SNOMED) offers a more comprehensive vocabulary1.
2. Proposed Solutions:
If you have any specific questions or need further details, feel free to ask!
Learn more
1forums.ohdsi.org2forums.ohdsi.org3forums.ohdsi.org4ohdsi.org+2 more
AI-generated content may be incorrect
Usefully, the summary provides links to the sources it summarizes.
I won’t comment on the completeness and accuracy of the summary other than to say it is easy to understand and seems to get at least some of the main points from a long complex conversation right.
Using copilot on GitHub issues and on conversations in Teams channels and meetings is already supported and, based my experience with Teams meeting summaries, very useful.
An approach to integrating content across the OHDSI Forums, OHDSI’s GitHub repos, and OHDSI Teams might be built by automating a prompted summary by copilot or another LLM based on conversations/issues in all three collaboration platforms. Summaries, in other words, would include the points discussed in all three platforms not merely summarize the separate conversations in each.
Identical cross-platform summaries could then be put in appropriately designated sections in all three collaboration platforms organized by topic. E.g.: In a new Summaries category in the OHDSI Forums; In a new Summaries channel in the OHDSI Teams tenant in the main OHDSI team; and in the Discussions or Wiki for the OHDSI GitHub organization account.
Though useful, summarization by LLMs isn’t perfect. So, human-in-the-loop review for accuracy before posting to each source seems like a requirement to me. If so, that might mean setting up a new Conversation Reviewer role in the community with defined responsibilities like: periodically generate, reviewing, correct, and posting identical copies of LLM-assisted cross-platform conversation summaries to each OHDSI collaboration platform. As with the example above, the summaries would include links to the sources in each platform that would simplify the review task.
I think any approach like this should also provide a way for people other than the reviewer to correct any misrepresentation of their own contributions to a summarized conversation. Handling that correction process in a way that’s fair to participants in the conversation and not overly burdensome to Conversation Reviewers could be tricky.
I don’t know if OHDSI LLM folks like Martijn are working on something similar to this already. I don’t want to this post to derail or delay any related work that is ongoing.
I will be meeting with Ofer Mendelevitch soon to discuss potential OHDSI applications of his new company Vectera. Ofer was the technology lead at Syntegra which used an early transformer approach to generate the Tufts Synthetic Dataset based on our OMOP instance at Tufts Medical Center a couple of years ago. I may discuss this idea with him when we meet.