A proposal for maturing our open community development activities

lee_evans · May 10, 2019, 5:05pm

I think we should distinguish between OHDSI ‘core software assets’ that are key to the overall OHDSI goal (’ generation and dissemination of reproducible, scientifically rigorous, comprehensive evidence’) and other voluntary software contributions from the OHDSI community.

The core software assets need to be more rigorously managed and a good first step is to assign formal OHDSI product owners.

We should continue to encourage innovation though the broader set of community software contributions by being more relaxed about requirements and management of those contributions.

I can see a lifecycle where some community software contributions eventually become part of the OHDSI core software and get assigned an OHDSI product owner. We can develop some helpful selection/acceptance criteria for transitioning community software into the OHDSI core software set.

Here are some of the software development challenges:

Complex OHDSI software dependencies - python modules/R packages/java/javascript modules
Multiple OHDSI supported DBMS & Operating Systems
Coordination of software release schedules for dependent software components
Support for new CDM database schema releases
Versioning of study design artifacts is needed:
- For continued support of older software releases in the community - especially for Network studies
- Migrate existing study design artifacts for compatibility with new software releases
- Study reproducibility
More unit tests/performance tests/integration tests required - need automated testing processes
Ownership/communication for OHDSI software products/product roadmaps is informal
Innovative open OHDSI community software ecosystem has a wide ranging set of software dependencies
Some additional complexity necessary to use components across Atlas & Arachne and R Methods

There are also some challenges to consider around software deployment:

Sparse documentation can be a barrier to deployment
Can be challenging to identify/install/maintain specific software dependencies
Need to handle potentially conflicting software dependencies (encapsulate dependencies)
Numerous steps needed to install tools & their software dependencies manually
Need more automated approaches to software distribution & installation

If OHDSI is going to assign product owners to core software then some questions to think about are:

How are OHDSI product owners nominated/transitioned?
What responsibilities/deliverables do they have?
How do they engage with the OHDSI steering work group & the wider OHDSI community?
What support/guidance should OHDSI provide to product owners?

t_abdul_basser · May 10, 2019, 6:36pm

I agree with the direction that you outline in this excellent post @Patrick_Ryan and stand ready to assist @hripcsa, yourself and others with implementing it, particularly regarding OHDSI Coordinating Center-level activities.

nsikak · May 10, 2019, 7:22pm

Great stuff @lee_evan, @hripcsa. I want to help. I think a Coordinating Committee can identify the different assets to develop. The committee can publish RFP to members for product owner nomination. The RFP responses would include a detailed management agenda and plan the responder would pursue to mature the asset to achieve the OHDSI mission. The RFP process should attract capable Technology/business development manager candidates the Coordinating Committee can interview and make a selection. I would be interested in responding to an RFP of this type, and also in other activities that would move this forward.

gregk · May 11, 2019, 12:48pm

apologies - was traveling last week with poor WI-FI occasionally present, so might be a little late to publish my comments.

@schuemie - yes Martijn, you are raising a very important question - for an owner (which one? tech. lead? product owner? ) - how to keep a balance between control and quality, maintainability, consistency, support various contributions etc… The example that you are raising is a good one:

sounds like you are both a product owner and tech. lead
as a tech. lead, you already have a certain structure established, best practices (some documented and some not)
you want people to contribute but not all adhiere to the principles established, unintentionally breaking an existing code etc…

As a mentioned above, there should be basic OHDSI software development principles - just like a principle to have important discussion on forums that we have today. One of the principles could be formulated:

discuss upcoming changes with product owners and tech. leads BEFORE you code it. For many, no need to go into details - just make them aware. For some, might warrant a deeper discussion. Otherwise - unless it is a bug fix - sorry, they have a right to reject if you drop it on them unexpectedly.

At the same time, let’s make sure the tech. leads for each project clearly articulate and document what is important to them on their individual projects e.g. coding conventions and principles that they want others to follow.

Yes, and I think there is a bit more than Product Owner and Contributor (if I got those two roles right). Some projects are more complex than others. In some, the product owner and tech. lead will not be the same. Take ATLAS - where Patrick is the product owner. Being one of the best Epidemiologists - he is not a Java developer and probably does not have extensive knowledge in software development process, configuration management, versioning, QA/QC, release management, etc… In addition to that, ATLAS integrates analytical methods developed by Martijn and Peter while some other (IR, Characterization etc…) are embedded into the tool itself. ATLAS must integrate with other tools and platforms, must support a complex range of security use cases. Looking at it from RACI perspective, as the Product Owner - Patrick is sure to be accountable (rAci) for making sure those things happen but will have to delegate to someone else to be responsible (Raci) to implementation. However, assigning people to tool features – all very, very important but half the battle – someone else needs to take care of an overall versioning, SD process, QA/QC, releases and this should be done across various features and components involved.

What I am trying to say is that since expect quality from our core components then developing complex platform like ATLAS, ARACHNE, ATHENA, Share Repo etc… requires a team and a solid SD and QA/QC process, Development /Test environment, versioning, maintenance, roadmap and prioritization etc… But we should use common sense - for example, if things work well with PLE and PLP, no need to drop an army of “helpers” and heavy process on Martijn and Peter

On a contrary, @krfeeney - we agree here J I also think that –by trying to keep things simple - we might be oversimplifying what it takes that it takes to manage ATLAS as a tool and are missing some key roles that would take of configuration management, SD process, releases and an overall architecture and vision and roadmap (see above). But I am also saying that we should not treat any tool in OHDSI in isolation and have some of those things done at the OHDSI level so that we can coordinate and synchronize what we do and how we do it better:

Overall architecture. And I do not mean technical - I mean starting with business architecture e.g. capabilities, users and use cases.
Vision and roadmaps of individuals tools and how they tie all together. For example, ARACHNE is dependent on ATLAS, ATLAS dependent on PLE and PLP and both are dependent on OMOP CDM.
Some basic SD principles and best practices
Single versioning approach

I am not advocating for an all prescriptive heavy process documents and architecture – oh, please no – but basic high-level business architecture on how everything fits together, high level roadmap and dependencies, some basic core SD principles that would apply across OHDSI as a whole. Not only we owe it to ourselves – but to the rest of the world who want to adopt ATLAS, OMOP CDM and other tools. How can we expect an adoption of OHDSI tools if we cannot easily show how everything ties together at a high level and when to expect what (or at least a basic). Everyone is asking when ATLAS will support OMOP CDM v6 - let’s tell them. We even had some basic roadmaps and tried to share them but I do not believe we had a good mechanism to do it.

SCYou · May 14, 2019, 2:37am

Thank you @Patrick_Ryan for posting this invaluable theme.

I suggest that we should keep CDM architecture, composed of data architecture and ontology, aligned with core applications of OHDSI. We need to discuss about the regulations in development of CDM architecture as well as applications. Because the CDM architecture decides the evoluationable, and even non-evolutionable areas in OHDSI ecosystem.

In my personal view, the core applications of current OHDSI are DatabaseConnector, SqlRender and FeatureExtraction (All of them are led by @schuemie ).

Unfortunately, I’m not sure that we cared enough whether CDM architecture will be or can be aligned with the core applications, when we published CDM v6.0.

I usually develop packages (Argos, RadiologyFeatureExtraction, noteCovariateExtraction) based on core applications ( DatabaseConnector, SqlRender and FeatureExtraction ).

I hope AEGIS to be aligned with core applications after revision of location table. But this will be possible only when the core applications (FeatureExtraction, in this case) can be hormonized with revised CDM architecture (table architecture + vocabulary).

In summary, OHDSI should keep in mind how to keep CDM architecture, composed of data architecture and vocabulary, harmonized with OHDSI core applications. And when we discuss the regulation in the development of applications for evolvable and vibrant ecosystem, we should discuss about how to develop CDM architecture for evolvable and vibrant ecosystem.

keesvanbochove · July 5, 2019, 3:33pm

Dear Patrick,

Sorry for replying late to this topic, I don’t think this has been implemented yet?

In any case, via The Hyve I’m involved or have been involved in multiple biomedical open source projects, such as tranSMART, cBioPortal, Galaxy, RADAR, GA4GH standards etc. and I definitely agree there is no one right way to do it.

My main advice would be: whatever you do, take baby steps. I’ve experienced a really promising, well-funded, well-staffed and well-meaning open source community being crushed by a too enthusiastically implemented governance overhaul. That was a painful experience and I would not want to go through something like that again. To be clear, I don’t think that is really a risk for OHDSI right now or for what you propose, but more as a disclaimer because this may influence this being my main advice.

Implementing your proposal with baby steps could mean for example to choose one of the (smaller) projects and pilot these ideas, maybe tweak them, and then turn that into an incubating program where you bring all the other main projects under the same governance mechanisms. An inspiration could be the Apache community, where they have PMC’s (http://www.apache.org/foundation/governance/pmcs.html) per project who decide the course of the project, in a transparent way. Some projects in OHDSI effectively already seem to have adapted this model (ATLAS and WebAPI).

Also, lightweight artefacts that could be shared by all projects, such as code styles, etc. I think are always a good idea. A simple yet very effective mechanism that works well to deal with the problem of contributions that are being ‘dumped’ is an RFC process preceeding new feature developments. As inspiration, have a look at the very lightweight yet quite effective way the cBioPortal community is handling that: https://docs.cbioportal.org/1.-general/rfc-list

Greetings,

Kees

Andrew · August 9, 2019, 4:28pm

@keesvanbochove Has the RFC process in cBioPortal typically elicited enough input so that it really reflects most users’ preferences and decreases the amount of push back once the new features are rolled out? Are there any tactics you think are useful in making sure enough input is elicited?

Vojtech_Huser · January 2, 2020, 1:52pm

I propose we make the list of leads a more formal wiki page somewhere

With latest release of WhiteRabit, I would even think Maxim is at least co-lead for WhiteRabit