Integration - Github and Discourse for Transparency and Community Involvement

hspence · August 1, 2023, 3:15pm

Workgroups could benefit from certain integrations between Github and the OHDSI Forums (Discourse). By linking these two platforms, we can ensure that non-code related discussions, such as mappings and decision-making processes, are easily accessible to developers and implementers on Github. This increases transparency and community involvement.

Interaction and Review in Discourse: There is a deprecated unofficial Github Issues to Discourse plugin that specifically targeted Github Issues. It refers to the official plugin Discourse Code Review plugin, which seems to use pull requests instead of issues. Any issues posted with a predetermined tag in a repository could be configured to also create a pull request.

Automated Notifications and Cross-linking: The official Discourse Github plugin supports back-links. Again, it is primarily focused on pull requests and commits, but this may be a lighter option compared to the Discourse Code Review plugin.

Automated Issue Creation: Certain topics, for example those with a Themis tag, could create a corresponding Github Issue. Similarly, a Github Issue could create a corresponding Discourse Topic with a Themis tag. This would likely necessitate a dedicated host and use of the Github and Discourse API. It’s not as simple as using the official plugins.

Periodic Consolidation Summaries: Instead of continuous cross-links or automated issue creation, a weekly post could be made by a bot to both Github Issues and Discourse Topics at a predetermined interval (e.g., weekly). This would be a consolidation of the discussion, using a format that can present a large amount of text for those interested to look through and reference on that platform.

Template Enforcement: Themis already implements a template for Github Issues that requests a forum post be linked. However, this could be further enforced by using Github Issue Forms and validation to ensure that a forum link is provided.

Andrew · August 9, 2023, 4:39pm

Thanks for posting this Hayden!

To the best of my knowledge, there are currently three main places where OHDSI community discussions happen. I’m excluding social media (Twitter/X, Facebook, etc.) but maybe I shouldn’t. I think for extended conversations, rather than announcements and news, these are the three main places:

The ODHSI Forums
OHDSI’s MS Teams Workgroup-based Channels
OHDSI GitHub repo comment threads

As the community grows, having three places where the same topic might be discussed is increasingly likely to be inefficient, promote misalignment of work, and lost opportunities for collaboration.

It might be ideal if we could automate the scanning of these three sources and make sure that at least one of them connects discussions that should be related to each other. So, I’m particularly intrigued by your proposed idea of a bot that consolidates discussions across sources. I’m also a bit cautious about adding a fourth place where community conversations happen.

I’m interested in your thoughts on these questions:

What is your sense of the advantages (user base, functionality, integration with workflows) of Discourse vs. the Discussions functionality in GitHub?
Is it possible to use the connector for GitHub in Teams to automatically link comment threads in OHDSI GitHub repos to the appropriate Channel in the OHDSI Teams tenant so that things get coped there?
Is there a risk that adding a fourth place (Discourse) for community conversations could exacerbate the efficiency, alignment, and opportunity loss problems, especially if only a segment of the community uses it?

cce · August 10, 2023, 9:40pm

@hspence : Thank you for bringing up the integration topic. I think this is a good subject for one of our open source working group meetings. Perhaps you may want to bring it about there? I think badges in each open source project linking to a relevant category/topic to in Discourse may be productive, low-hanging fruit. Your other suggestions are great but may require some concerted effort to setup and maintain.

@Andrew : I think Hayden is suggesting that we more tightly integrate (1) The OHDSI Discourse Forums with (3) OHDSI GitHub repositories. To answer your questions:

I think Discourse is superior to GitHub Discussions functionality, also, I think having a community forums like this is not really the same as GitHub discussions.
We should look into connectors that link GitHub and Discourse with MS Teams.
It seems we already have community conversations here on Discourse (aka OHDSI Forums) so the question is what goes where.

I think discussions directly related to particular software products, e.g. bugs or feature quests, are probably best done as GitHub issues. However, broader discussions that are of historical use probably belong here. Temporal discussions could be done in MS Teams. It’s a good question how to link all 3 communication channels.

MPhilofsky · August 11, 2023, 2:52pm

+1

I really like this idea! Most of the “work” we do as a community occurs on GitHub, but the discussions of advantages/disadvantages, use cases and dependencies happens on the forums. I believe linking the two would be very beneficial for the community for the reasons @Andrew states above. And agree with @cce, discussions on the OHDSI forums are superior to GitHub discussions. The forums provide a much more friendly UI.

@hspence Let me know how I can help. My expertise is defining use cases and testing solutions.

hspence · August 14, 2023, 3:28pm

@Andrew

Discourse is more accessible to non-technical users, whereas GitHub is developer-focused. I see the overall aim as keeping OHDSI Forums (Discourse) as the primary space for community conversation while enhancing the ability to match discussions with implementations that occur in Github.

The idea of MS Teams integration for specific channels seems promising. Each Work Group could act as a stakeholder in a set of repositories, with Subgroups possibly being stakeholders in a more specific set of repositories. This stakeholder list could determine which channel received updates on which repositories activities.

@cce

I’m willing to facilitate one of the Open Source WG meetings around this topic. I’ll get in touch with Paul Nagy to coordinate this.

I believe some type of topic model could be advantageous, not just for linking badges to forum topics but also for understanding the broader community discussions. I’ll connect with the NLP group to see if they would like to be involved in that portion.

As a first step, identifying which platform in our communication ecosystem (Teams, Discourse, GitHub…) should serve as the primary might be essential. The primary choice would depend on the integrations that are most supported with the other platforms. From what I’ve observed, Discourse fits as the primary, but I’m eager to hear others’ opinions.

Expanding on the last point, if we consider Discourse as the main discussion platform, it may not necessarily be the best primary integration. It could be that Teams, GitHub, or another method of linking everything together proves more feasible / sustainable.

@MPhilofsky

I see the main advantage of GitHub discussions as the ability to link pull requests to discussions or issues. This way, there’s a record of why something was done, in addition to what was done. I understand the concern of WGs being that some conversation may occur in a place like Discourse, where the ‘what’ isn’t as connected to the ‘why.’ This is vital for enduring projects such as the CDM and Vocabulary, which have a substantial history and a long-term future ahead.

I agree that Discourse (OHDSI Forums) is the most user-friendly and accessible platform for discussion.

An assessment of the broader community’s needs and preferences, including which WGs/Subgroups are considered stakeholders in which repositories. Please give suggestions on how to gather this feedback (MS Forms, structured interviews w/ WGs, ect)
I’m willing to facilitate one of the Open Source WG meetings around this topic. I’ll get in touch with Paul Nagy to coordinate this and focus the discussion on technical feasibility, collaboration, and implementation.
I believe some type of topic modeling could be included here, not just for linking badges to forum topics but also for understanding the broader community discussions. I’ll connect with the NLP group to see if they would like to be involved in that portion.
A volunteer pilot WG for testing prior to a larger roll out.

Please let me know if any of these options appeal to you for collaboration, or if you have any additions or critiques.

MPhilofsky · August 16, 2023, 1:53pm

The Steering Committee, @Patrick_Ryan et al., twice a year meeting for the workgroup leads. Last year, they surveyed the working groups to see which ones overlap. I think this would be a good starting point.

Team work makes the dream work!

I’ll volunteer Themis to the be the test workgroup

Andrew · October 31, 2024, 3:04pm

I want to push this conversation forward with a focus on the potential for LLMs to help with integration across the major collaboration platforms used by the OHDSI community. The need for this integration and for authoritative summaries of complex conversations grows more urgent with the size of the OHDSI community and the duration of if its conversations. Without some solution I think we are at increasing risk of misunderstanding and collaboration that is ineffective because of parallel, diverging, and disorganized efforts related to the same topics.

The focus on copilot/openAI in the example below is based on convenience and is not an endorsement for this specific LLM summarization approach over other options.

When using Edge as a browser, you can give prompts to summarize OHDSI Forum discussions using Copilot.
For example,

this prompt:
Summarize the discussion on this webpage: Blood culture result in measurement or observation table and vocabulary to be used

generated this summary:
The discussion on the OHDSI forum revolves around the best way to handle blood culture results within the OMOP Common Data Model (CDM). Here are the key points:

1. Main Issue: Users are debating whether blood culture results should be stored in the Measurement or Observation table. The challenge is that the vocabulary used for the Measurement table (LOINC) is incomplete, while the Observation table (SNOMED) offers a more comprehensive vocabulary 1.
2. Proposed Solutions:

If you have any specific questions or need further details, feel free to ask!

Learn more

1forums.ohdsi.org 2forums.ohdsi.org 3forums.ohdsi.org 4ohdsi.org+2 more

AI-generated content may be incorrect

Usefully, the summary provides links to the sources it summarizes.
I won’t comment on the completeness and accuracy of the summary other than to say it is easy to understand and seems to get at least some of the main points from a long complex conversation right.

Using copilot on GitHub issues and on conversations in Teams channels and meetings is already supported and, based my experience with Teams meeting summaries, very useful.

An approach to integrating content across the OHDSI Forums, OHDSI’s GitHub repos, and OHDSI Teams might be built by automating a prompted summary by copilot or another LLM based on conversations/issues in all three collaboration platforms. Summaries, in other words, would include the points discussed in all three platforms not merely summarize the separate conversations in each.

Identical cross-platform summaries could then be put in appropriately designated sections in all three collaboration platforms organized by topic. E.g.: In a new Summaries category in the OHDSI Forums; In a new Summaries channel in the OHDSI Teams tenant in the main OHDSI team; and in the Discussions or Wiki for the OHDSI GitHub organization account.

Though useful, summarization by LLMs isn’t perfect. So, human-in-the-loop review for accuracy before posting to each source seems like a requirement to me. If so, that might mean setting up a new Conversation Reviewer role in the community with defined responsibilities like: periodically generate, reviewing, correct, and posting identical copies of LLM-assisted cross-platform conversation summaries to each OHDSI collaboration platform. As with the example above, the summaries would include links to the sources in each platform that would simplify the review task.

I think any approach like this should also provide a way for people other than the reviewer to correct any misrepresentation of their own contributions to a summarized conversation. Handling that correction process in a way that’s fair to participants in the conversation and not overly burdensome to Conversation Reviewers could be tricky.

I don’t know if OHDSI LLM folks like Martijn are working on something similar to this already. I don’t want to this post to derail or delay any related work that is ongoing.

I will be meeting with Ofer Mendelevitch soon to discuss potential OHDSI applications of his new company Vectera. Ofer was the technology lead at Syntegra which used an early transformer approach to generate the Tufts Synthetic Dataset based on our OMOP instance at Tufts Medical Center a couple of years ago. I may discuss this idea with him when we meet.