A proposal for maturing our open community development activities

Patrick_Ryan · April 28, 2019, 5:28pm

Team:

Five years ago, a few of us thought it would be fun to come together and create an open science community and wondered aloud if anyone would join us. Each of us had been working in our own separate silos, some of us had collaborated on OMOP, but all of us shared a basic belief that real-world evidence could dramatically improve healthcare if done right and none of us believed we could solve this problem by ourselves. Our idea was pretty simple: create an open place where all people from all backgrounds and perspectives felt encouraged to collaborate on the research, development, and application of analytic solutions for observational data to generate and disseminate evidence that could inform medical decision-making. So, we started OHDSI, curious where the journey may lead us.

Fast forward 5 years, and I could have never anticipated how fast and large we’ve seen our community grow. On this forum alone, we now have 2,455 users engaging in discussions about how to use their observational data to generate reliable real-world evidence. For last year’s OHDSI US Symposium, we asked the community to self-identify what observational data sources have you converted to the OMOP common data model, and we found 97 databases in 19 countries, collectively representing more than 2 billion patient records. At last month’s OHDSI Europe symposium, we had 260 people from 27 different countries come together to collaborate. Just last week, we announced the opening of registration for the OHDSI US Symposium, which will take place in DC in September, and already 135 of the 500 available seats have been taken by researchers from government, academia, industry, and health systems. OHDSI events have been hosted in New York and Atlanta and San Francisco and Rotterdam and Oxford and Suwon and Guanzhou and Shanghai. The collective expertise and energy to work together toward a common mission: ‘to improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care’ to truly inspiring.

Today, we have a vibrant community that is establishing open community standards for harmonizing the structure, content, and semantics across disparate observational databases around the world. Our community is leading cutting-edge methodological research that is identifying and evaluating scientific best practices for clinical characterization, population-level effect estimation, and patient-level prediction. We’ve applied these best practices to important clinical questions that matter to patients, whether it be specific targeted questions like the safety of levetiracetam, or the comparative effects on HbA1c amongst diabetes treatments, or large-scale analyses like the LEGEND studies for depression and hypertension. And we’re supporting national and international efforts to scale the public health impact that observational data can make, through programs such as FDA/BEST, NIH/AllOfUs, and IMI/EHDEN.

I think a lot of the tremendous progress we’ve made can be together can be attributed to our commitment to working as an open science community, and sharing in the development of open-source tools that make it possible for all of us to adopt open community data standards and apply open standardized analytics to our data. We’ve seen this commitment manifest in a robust ecosystem of open-source solutions that support the entire analytics lifecycle from data management and quality assessment to back-end packages for large-scale statistical modeling to front-end web applications for study design and execution. This ecosystem makes it possible for a researcher to generate reliable real-world evidence from their observational data with higher quality and greater transparency, and to do so in a matter of minutes what previously took months. The OHDSI ecosystem has grown to now include 148 Github repositories, with contributions by more than 100 developers.

But with any growth can come some growing pains, and supposedly when we get older, we’re supposed to become a little wiser and more mature as well. What we could get away with as a precocious upstart might be different from what we expect of ourselves as our community evolves and the user adoption of the OHDSI ecosystem expands.

Now I am not a software developer. I enjoy hacking a little code, and really like solving hard problems by rapidly prototyping ‘good enough’ solutions. I know many in the community falls into my same phenotype in that regard. But I recognize that, while the uncommented spaghetti code I write for myself may be good enough for my bespoke purpose in the heat of the moment, it is not nearly good enough for a sustainable solution that others will rely on to meet their needs. In fact, in my zest to advance science by introducing my half-baked approaches to the fold, I may be doing more harm than good. Contributing as a community developer can be challenging, because it requires being sensitive to the community ecosystem and its users. In order to build trust, a proper open science community solution is an open-source tool that is not only fully transparent and publicly available, but also one which is developed by a competent team of individuals, is fully compliant with the community standards, produces consistent results, is appropriately documented and properly tested, and is responsive to the needs of the community.

To date, we’ve been fairly inconsistent in how the community has managed its open-source development across the OHDSI ecosystem. We’ve got away with it in some areas, in part because we had a smaller team of developers who aligned on a shared understanding of best practices and faithfully followed these implicit expectations. For example, @schuemie and @msuchard have a shared vision about best practices for R package development that they have applied throughout the OHDSI methods library, and @jennareps and @Rijnbeek largely adopted those principles in the development of PatientLevelPrediction. Some aspects of these developer best practices have been documented, such as recommendations for unit tests in R or code styleguides for SQL, but other aspects are either insufficiently documented or inadequately enforced. When ATLAS was in its infancy, @Frank, @Chris_Knoll and @anthonysena could grok the whole codebase on their own with little help from others. But now that ATLAS has evolved to become a unifying platform to support the design and execution of observational studies, with a myriad of analytic features and is comprised of community contributions from dozens of developers, its difficult to imagine that anyone could get their hands around all the details of everyone component. And concurrent with the growth of complexity of the tools has come the growth of the user community; because the OHDSI ecosystem is now actively deployed at so many institutions, it is increasingly important for us to have software releases of each of its components that the community can trust.

To meet these emerging community needs, I propose that we formalize our open community development processes to align on shared expectations and to hold ourselves accountable to the standards we expect for our community. Before we launched OHDSI, I read Producing Open Source Software by Karl Fogel, and in thinking through the issues our community currently faces, I found myself going back to this resource and finding it still quite useful to think through what makes most sense for our community. One insight that is particularly apparent is that there is no one ‘right way’ to build a open source solution, but it is very important that a community aligns to the ‘right way for them’, whatever that may be. So with that, I offer the following strawman to stimulate the community discussion:

I recommend that, for each open-source product with the OHDSI ecosystem, that we have a named product owner. The project owner assumes the following responsibilities:

Serve as a benevolent dictator, with final decision-making authority on the project
Ensure project codebase remains open source and publicly available through OHDSI GitHub repository (Apache 2.0 license for open community use)
Produce documentation that details desired objective and available features, installation instructions (including system requirement)
Ensure an adequate test framework is properly implemented prior to any release. Testing should be fully described as part of the release.
Establish and communicate a project release process and roadmap for future development, which is responsive to user community and contributors
Coordinate with other project owners who have logical dependencies on the project
Support transition of the project to a new project owner when necessary

I recommend that we also align on shared expectations for our contributor community:

Use the tools
Support documentation and facilitate adoption by community
Identify bugs and post them on the GitHub issue tracker
Propose solutions to any identified bugs to the project owner for their consideration
Suggest potential feature enhancements on the GitHub issue tracker or OHDSI forums
Collaborate with project owner to design and implement new capabilities

I recommend that the OHDSI Coordinating Center, led by @hripcsa, continue to oversee the OHDSI community assets, including the OHDSI Github repository, and work with each project owner to ensure that each project satisfies the recommended community standards. If a project does not conform to agreed standards, the OHDSI coordinating center reserves the right to tag the project as non-compliant. If there is disagreement in the community about ongoing project development activities, the OHDSI coordinating center will serve as the final arbitrator.

If someone wants to create a new project to add to the OHDSI ecosystem, they can contact me and I will be happy to discuss and coordinate with the other project owners to initiate the effort (including creating the OHDSI Github repository).

Below is my recommendation for named project owners for components within the current OHDSI ecosystem:

OHDSI Ecosystem Project owners
Common Data Model
CommonDataModel	Clair Blacketer
DDLgenerator	Clair Blacketer

Standardized Vocabularies
Vocabulary-v5.0	Christian Reich
ATHENA	Greg Klebanov
Tantulus	Peter Rijnbeek

ATLAS platform
User experience	Patrick Ryan
Technical integration	Greg Klebanov
Security	Pavel Grafkin
Data sources	Ajit Londhe
Vocabulary search	Frank DeFalco
Cohort definition	Chris Knoll
Characterization	Pavel Grafkin
Pathways	Chris Knoll
Incidence	Chris Knoll
Estimation	Anthony Sena
Prediction	Anthony Sena

ARACHNE	Greg Klebanov

ACHILLES	Ajit Londhe

OHDSI Methods library
Population-level estimation
CohortMethod	Martijn Schuemie
SelfControlledCaseSeries	Martijn Schuemie
SelfControlledCohort	Martijn Schuemie
CaseControl	Martijn Schuemie
CaseCrossover	Martijn Schuemie
PatientLevelPrediction	Jenna Reps
BigKnn	Martijn Schuemie
Methods characterization
EmpiricalCalibration	Martijn Schuemie
MethodEvaluation	Martijn Schuemie
EvidenceSynthesis	Martijn Schuemie
Supporting Methods Packages
Cyclops	Marc Suchard
DatabaseConnector	Martijn Schuemie
SqlRender	Martijn Schuemie
ParallelLogger	Martijn Schuemie
FeatureExtraction	Martijn Schuemie
OhdsiRTools	Martijn Schuemie
Hydra	Martijn Schuemie
drat	Marc Suchard
LEGEND	Martijn Schuemie

ETL support tools
WhiteRabbit	Martijn Schuemie
Usagi	Martijn Schuemie

Other community solutions:
PhenotypeLibrary	Aaron Potvien
PheValuator	Joel Swerdel
Aphrodite	Juan Banda
Criteria2Query	Chi Yuan
QueryLibrary	Peter Rijnbeek
OHDSIonAWS	James Wiggins
Broadsea	Lee Evans
BookOfOhdsi	David Madigan
Aegis	Chan You
ShinyDeploy	Lee Evans
Circe	Chris Knoll
CommonEvidenceModel	Erica Voss
THEMIS	Mui Van Zandt

Below is a rough schematic of the interplay between these various projects within the OHDSI ecosystem, as best as I can see it. I expect this will be an evolving landscape, but already it should illustrate while maturing our processes is likely a good idea, given the complexity and interdependencies between our various components.

I welcome a healthy discussion from our community about how we should evolve our open community development activities. I know @pbiondich probably has the most experience of anyone in our community through his work with OpenMRS and hope that he can shed some light on this proposed direction, good, bad or indifferent. My opinions are largely informed by ongoing discussions with @Christian_Reich, @schuemie, @msuchard, and @jon_duke, but I don’t presume to speak for them. I suspect @JamesSWiggins can share some valuable insight based on the work he’s leading at AWS. I expect @gregk, @pavgra, and @lee_evans will have plenty to say as well. I look forward to hearing from you all, learning from your experience and perspectives, and working together toward developing innovative solutions that can improve health through the evidence we can collaboratively generate. We’re all on this journey together, and I’m excited for the path ahead.

Cheers,

Patrick

hripcsa · April 28, 2019, 8:35pm

Thank you, @Patrick_Ryan. I very much support the proposal, and I also very much look forward to comments from the community. It is critically important for us to balance the need to ensure openness and opportunity for innovation while also making sure everything works together.

Rijnbeek · April 30, 2019, 4:56pm

Thanks @Patrick_Ryan i think it is awesome we are at this stage now and need to have this discussion. It is a demonstration of the success of the whole community. I also very much look forward to help bringing this to the next level of maturity with all of you involved.

With my EHDEN hat on i think this is a crucial step to support the further extension of the community and data network. This will kick the ball in the net and will make OHDSI the sustainable solution. I personally like to invest a lot of time on improving quality control and the creation of an e-learning environment to train the community. This will require the steps you are proposing.

I am also very happy that this is very much aligned with the EHDEN Description of Action and we have resources to help OHDSI in this next step on its journey. There are definitely opportunities to take over some of the ETL tools from @schuemie who has done a major effort, and in my view is named too often on your list. Happy to discuss this further and see how EHDEN can take of some of the load on his shoulders. I also think that EHDEN should contribute heavily in the vocabulary work since we will have to further improve the quality control pipelines, processes for extension and releases of vocabulary versions. It is great Odysseus is an EHDEN partner but also the involvement of the broader Vocabulary Team led by @Christian_Reich will be very helpful for this effort.

I think its a good first step to make clear who is the responsible person for a specific tool, what his/her tasks are, what decision power that person has etc. However, I wonder if we would also need some overarching body to keep the oversight of the full eco-system? You say this will be Columbia but how would this work exactly, who is that person, will this be a group etc? Furthermore, since OHDSI is becoming global it is important we get a better understanding of the roadmaps on global level and needs etc of all the other active groups in for example South Korea, China, and Europe. We need to avoid parallel developments or even worse forks that diverge. How to control this is definitely good to brainstorm about once as well.

I am also curious how we can learn from other large successful open source projects like R, Moodle etc in how they handle these challenges. Is there a way to get in contact with these projects and have them share experiences?

Looking forward to the input from others on this topic.

Peter

pbiondich · April 30, 2019, 5:20pm

Hi Patrick and colleagues within the OHDSI community:

I’d start by saying congratulations. It’s an honor to be a part of the OHDSI community, and participate in some small way in it’s meteoric rise.

From my vantage point, the community has grown very rapidly in a relatively small period of time. This is a reflection of the importance of the underlying mission and the community’s interest in it, and perhaps even a recognition by many of you that doing this kind of work successfully is a team sport.

We had a lot of the same dynamics you’ve described above play out within the OpenMRS (http://openmrs.org) community.

In my experience, there’s always important tradeoffs to consider when adding process within a community whose members deliberately choose or volunteer to participate. It often doesn’t take as much as we think to excite and interest people to engage more (so it’s important to understand those basic needs of people and ensure they are fed through the community process). It also conversely doesn’t take much of a negative experience to repel people away (given that they can often either go back to their old way of working, or find another more conducive place to get the kind of support they need).

That said, the underlying basis of what you’re saying is:

Lots of people are wanting to contribute to the community’s “public good” artifacts (yay)
Those contributions if not coordinated and managed can actually net the community poorer quality, less useful artifacts (boo)

This is also a lived experience we dealt within the OpenMRS community. It also happens to just about every successful open community as it grows. The ways that other communities have dealt with this is via both process tweaks and governance tweaks.

Process tweaks include things like: unit testing, documentation standards, code style conventions, formalisms that distinguish between works-in-progress and final products, pre-release testing processes, etc. These kinds of tweaks, especially if implemented incrementally (vs. big bang), tend to be well received.

Governance tweaks on the other hand really try to establish distributed authority (vs. chaos). We continue to learn in the OpenMRS community how not to establish governance. For example, one time, I tried to write an overarching governance model for the community, and that totally bombed (to what problem is this the solution?, WTH is Biondich et al trying to get away with here?, painful to read and understand and I just want to participate!, etc etc).

One of the things I’ve learned about community is that people expect that their level of participation should correlate with level of influence within the community. So, we should be deliberate about ensuring that project ownership is a merit-based role, and that those who contribute the most to a given project receive influence within that project. So a friendly edit to your concept would be to say that all projects should have at least one merit-based leader, and it’s the job of that leader to build consensus around a way forward with all meaningful contributors to the project.

All of that could be summarized to say: build just enough community governance, and no more. Augment that governance with a growing collection of community process enhancements like I mentioned above.

Let me know if you want me to go into more detail about those process approaches, how they can be ordered in, etc.

Christian_Reich · May 1, 2019, 1:55pm

@pbiondich:

We are very keen to hear your experience, because we do believe we are at an inflection point with respect to scale and maturity, and a community like OHDSI doesn’t get run like an academic or private sector institution most of us are used to.

Couple questions about the governance tweaks:

Is this what Patrick is trying to achieve above? Or do you think it is in contradiction. I am not quite following.

What do you do if the most enthusiastic contributor doesn’t turn out as a good community leader? Has a hard time letting go, letting others in, all that stuff. Any suggestions?

What is the right size of the project? One thing Patrick is trying to do is to join low-level features (tools, methods, foundationals) into higher-level artifacts. How, for example, should we keep Atlas as one consistent piece of software, when the individual projects with their merit-based leaders are at a much more granular level?

Do you have a good idea of the right amount?

tscarne · May 1, 2019, 5:04pm

@Christian_Reich

If I understand the issue under discussion, I suspect that part of the challenge the community is facing is simply managing the consensus process around decision making. OHDSI is complex as it is driving standards development, tools development, advancing research, and supporting adoption. There are proven models for governing the standards processes. The same can be said about the tools activities. And the research community. I would suggest approaching each of these activities independently as you consider what is enough community governance.

pbiondich · May 3, 2019, 4:03pm

I hope so! I have no way of knowing each of the named members above, and their contributions to date.

What I will describe are the two issues OpenMRS has run into in the past:

Having ad hoc, non-predictable ways of establishing leadership that have made significant contributors feel inadvertently disenfranchised
Not having ways to gracefully transition leaders when their role in the community inevitably changes or declines away from the role they’re currently within

So, maybe what I’d say is: make sure your process to establish project owners is clear to understand, and make sure you also come up with a predictable way to transition this role over time. Also consider models of multiple owners to provide redundancy. In some cases, this has been essential for us.

This relates to the comment above about transition strategies. You all should provide a way for those that are involved with a given project to communicate concerns about a project owner’s performance to the coordinating center. What is implied in Patrick’s email is the overarching role of the coordinating center and @hripcsa to manage this governance process long term.

I don’t know. Every time I’ve tried to formalize this level of detail in the community, I’ve realized that it’s fruitless to define. Flexibility and adaptability trump formalisms. In this case, you should start with whatever granularity makes intuitive sense, and start to add rules when problems arise. I think these kinds of granularity issues work themselves out naturally over time if we learn to trust those that are more integrally involved.

Grin! I’m getting better at knowing the answer to that question within my own community (granted, I’ve been at it for over 15 years now lol), but my point is that you evolve into more sophisticated governance as the community needs them. It’s a “feel” thing, not a prescribed process per se, at least in my experience.

The mere fact that @Patrick_Ryan and so many others in the community are introspective enough to write an email like this says volumes. You all recognize your collective need for more process, and you’re allowing the community to weigh in. That will allow you on a continuing basis to find the right balance. Of course, you’ll often under or over estimate, but if you’re agile, you can course correct when that happens.

Sorry I can’t give you something more concrete here.

Vojtech_Huser · May 3, 2019, 4:55pm

A agree that it is nice to make such introspective post and I am a big fan of the shift in the community coordination.

Vojtech_Huser · May 3, 2019, 4:57pm

I am afraid that the split of development (mentioned by Peter) has already occurred!

In addition to Patrick’s mentioned networks, there is yet PEDSNet. Picture below and link https://pedsnet.org/documents/206/ETL_Conventions_for_use_with_PEDSnet_CDM_v3.1_OMOP_V5.2.pdf (page 32 of it)

(see local codes in the 2 billion range)

I also agree that workload of some names on the list must be enormous.

I would propose a larger list that lists all internal contributors (per section here) for a given github project.

http://www.ohdsi.org/web/wiki/doku.php?id=development:ohdsi_github_projects_v2#contributing_code

In addition to documenting a whole function, I would argue for tiny bit more comments in the R code (and possibly in SQL code). (improve this guideline here http://www.ohdsi.org/web/wiki/doku.php?id=development:ohdsi_documentation_guidelines )

(e.g., for every 50 lines of code, aim for one comment line (not just roxygen description of the function))

Christian_Reich · May 5, 2019, 7:14pm

@Vojtech_Huser: Did you paste your post in the wrong discussion? Looks like it.

gregk · May 5, 2019, 10:30pm

First of all, thank you Patrick for bringing all your energy, work and leadership into OHDSI and getting us to this point.

Apologies for taking some time to respond, here are my 2 cents:

When I look at OHDSI, I see a slightly different picture. I do not see the GitHub repos or even individual components - instead, I see a set of core products, tools and capabilities – something like this

What is important from that picture above:

a. These are the core capabilities that users use and expect to work
b. There should be a one-way down dependency between Studies / Platforms / Methods / OMOP CDM (where architecture is designed properly)
c. All GitHub repos and components would collapse into one of the capabilities. If some do not, we need to check why
d. These are a mix of technical capabilities and processes and best practices. We could break it up into two diagrams not to confuse things and clearly separate the OHDSI Data Science Toolkit
e. We can shuffle things around but i think the classification on a left is important

The boxes highlighted green is where I believe we have a current gap and need to figure out how to put those in place – some we are already discussing (shared repo) and some that we really should. Also, if we keep focusing on GitHub repos and components as a unit where we operate, I am afraid we will not scale.

Before we start labeling people as owners/leads on various projects/capabilities, I think it is important to clearly spell out what we expect from the.

I see something like this:

a. Product owner – a person who is deeply vested (power user?), has a deep understanding of major business use cases and is in charge of setting up a vision and continuously evolving a capability (product) from business utilization perspective. The product owner must ensure that the end product is of a high quality and has a required documentation, including training
b. Functional SME (s) – a person who has a deep understand of certain narrow use cases and can contribute to the capability feature development. The SMEs must create business requirements and documentation for any new feature of the product.
c. Technical Lead (s) – the person who can create a technical specification of the feature based on business requirements and develop or lead other developers to engineer a product. The tech. lead would also ensure that there is a Software Development process in place, including automation, builds / CI, testing, coding best practices, versioning etc…

Those are pretty distinct type of people and each requires certain experience and skills. They must work very closely together but at the end the Product Owner is calling the shots. Should we have one or more product owners and tech. leads? All good questions that we could discuss. But at the end, if a product is a core capability – we should start demanding certain process and quality to be in place (see #3) and expect a certain time commitment and required skills from people that wear a certain title.

Not all projects in OHDSI are equal. Some have matured and became core capabilities, some are popular but still evolving, some are in the POC staging phase and might never even make it. This is significant because we should not expect the same quality rigor and process for every component. We could apply the following categorization to all OHDSI components and set the expectations of quality, architecture and process based on those:
a. Mature – must follow a documented and consistent OHDSI SD process, documentation, consistent OHDSI versioning and go through rigorous testing for each release.
b. Evolving – is on track to becoming Mature and **expected to implement a documented SD process, documentation, versioning and go through rigorous testing for each release. It is likely that not everything is consistent or in place yet on such project.
c. Staging (POC?)

I believe that in OHDSI we are going through the maturity pains and beginning to expect a certain level of product quality, process and architecture consistency from our Core capabilities. Most (of not all) capabilities are Evolving and none of them are Mature yet (and some are very close – OMOP CDM, THEMIS, ATLAS, ARACHNE and more) but Mature is where we need to be. I believe that one of our biggest challenges in OHDSI community is a weak process and architecture alignment and consistency across different Capabilities/Product. Worst yet, in many OHDSI projects we are trying to figure out the same problems at the same time – integration, architecture, process, versioning, release, testing - and end up doing it in different ways. I believe that the time has come that we consider having a team/project focused on outlining the following for OHDSI as a whole:

Business Architecture vision, including new capabilities
Technical Architecture and Integration Approach and Standards, including security
Processes, including SD, QA/QC, versioning, release etc…
Shared components?

Who should be in this group? Should it be yet another project? Can we develop those on one project and then propose to adopt across OHDSI? All good questions that we should discuss – but we should stop inventing wheels on each product/product and work on it collaboratively.

But many of our current challenges are actually not a result of an inconsistent process or even a lack of thereof, but rather some basic rules of engagement. Looking back in time, there was a great piece of work published back in 2001 called “Agile Manifesto”.

Core Values

12 Principles
https://agilemanifesto.org/principles.html

It is actually quite incredible how many useful principles (if not all) we can adopt from that 18-year-old work, but one of them is definitely what we need to emphasize:

“Individuals and interactions over processes and tools”

OHDSI is a “collaborative”, meaning whether we are working on multiple project in parallel or the same one – we need to make sure to have constant, frequent and interactions between individuals. And those interactions must start before code is touched or written – let’s get better at being inclusive and discussing the vision, ideas and changes before we even embark on those. And I also mean myself here as well – also guilty of rushing ahead with the implementation instead of first sharing and brainstorming with others.

Ugggh, and now I am going back in the thread to read what others posted here while I was taking my time

schuemie · May 6, 2019, 10:48am

I love all the discussions on grant architecture and all, but for me the issue at hand is a simple one:

I’m ‘owner’ on quite a few projects / assets, and that means I feel responsible for these tools. I respond to issue reports, make hot fixes where needed, and provide overall support. To keep that manageable, I’ve organized the code in certain ways, and follow certain principles. (some of those principles could be documented better, yes, but not all).

Then there are ‘contributors’, folks that often have a specific issue they need solved for their use case. Their contributions are extremely valuable, and I try to treat them as such, but they don’t always fit exactly in the long-term strategy that I have, or follow the organization or principles of the code.

It is these differences in perspectives between these two roles (‘long term’ and ‘short term’ I could call them) that is now causing friction. I like the proposal that Patrick has put on the table in that it is simple: by just recognizing these roles, and making clear what the responsibilities are for each, I think we can solve most of our problems.

And yes, @Rijnbeek, I would appreciate being able to hand over some of these owner responsibilities to others Especially the Rabbits and Usagi. But one thing to keep in mind is that the role of ‘owner’ is a long-term commitment, so we need to make sure we have the (financial) incentives in place to allow folks to make such commitments.

pbiondich · May 6, 2019, 3:40pm

This is a great example of where conventions can really be a game changer:

Take a look at what our community has done:

https://wiki.openmrs.org/display/docs/Java+Conventions

Repeating what I said above: the best case scenario is when you can standardize conventions as much as possible before you move towards governance (person-mediated) models. Of course, those take time and effort to put into place.

Chris_Knoll · May 6, 2019, 4:51pm

@pbiondich, I think Martijn’s point was (correct me if I’m wrong, @schuemie!), wasn’t a matter of coding conventions (which I agree, are important) but rather when he says ‘principles of the code’, he’s referring to is the type of problem the code base is trying to solve and the approach being applied to solve the problem (which is the ‘principle of the code’).

Your document presents the coding conventions very well, but a more generic level (Java solutions). I think what Martijn was speaking of is how people contribute to the code base in a way that follows the ‘vision’ of how the particular code base is meant to evolve. To give a concrete example: FeatureExtraction is a piece of code that is used to extract features from data about a population for use in predictive models. Friction arises when contributions are proposed that ‘break the mold’ of what the overall design goal of the software is intended to do. Much of this is hard to document and is not as simple as declaring naming conventions or ‘tabs vs. spaces’ rules.

By declaring that there is an ‘ownership’ role that has a long-term vested interest in the overall direction of the project, the person in that role has the authority, responsibility and accountability on how changes should be introduced.

I’d be interested in hearing from your experience with dealing with these types of leadership issues when working in OpenMRS, @pbiondich, and how we could apply those learning to maintaining civil discourse in our own community engagements.

-Chris

schuemie · May 6, 2019, 5:14pm

Thanks @Chris_Knoll, that is exactly what I meant.

krfeeney · May 6, 2019, 7:01pm

100% agree with this point and think it’s important to understand how we evolve to both appreciate and accept that there will be times this happens. It’ll mean entirely new software branches are necessary to develop for that use case rather than stretching a piece of software to its unintended purpose.

I also think @schuemie raises a great point. In addition to the assignment of Product Owners, we need a process for allowing transition of ownership of specific work products/software that are worth continuing but also recognize that humans are finite and have capacity limits. It’s also important because many people ask, “how can I contribute?” and we currently have a very patchy system of getting people plugged into the community’s greatest needs. This should be simpler if we’re adopting a holistic development vision.

I tend to disagree with @gregk’s vision slide because it oversimplifies the complexity of ATLAS as a tool. In doing so, we’re also oversimplifying the challenges we’re facing about how to advance different use cases within ATLAS because ownership is too generic and not specific enough to the teams who actually build and deploy. In @Patrick_Ryan’s view, the delineation is important because inter-dependency is key. There needs to be some appreciation for how one part could feedback and break the whole. The governance of release management and overall change management is as important as delineating the pieces.

I’m not sure about the Functional SME as a dedicated role. We’re making an assumption that we have a lot of hands on deck which sometimes we do… but we’re rather scrappy once we divide into so many parts. We should keep a very lean vision of a team so as to avoid having holes in the ability to get things done.

This discussion feels quite cumbersome for a Forum post. It feels like a Face-to-Face topic that should be addressed with a clear understanding of approved conventions and a specific roadmap.

The #1 thing we should get out of maturing our open community development activities is transparency into exactly what’s in the development pipeline and what is not. A roadmap that everyone can see is the first step. Designated times when we actively triage requirements – this will allow community members know when to be the loudest, “speak now or forever hold your peace [until this next time]” kind of thing :-D. From there, we could actually march in a direction and communicate progress. It also allows Product Owners a timeline of when they would need to activate transition processes, if they’re running low on the ability to continue to champion an effort.

This is to say… I think the Symposia are a great forcing function to push dedicated momentum but there could be other natural times too. And with that, I’ll pause and let others interject.

saradempster · May 6, 2019, 10:41pm

I’m following this discussion with great interest. Like others, I will reiterate that OHDSI has done amazing work to create a unique framework for observational research and I’m excited to see coordinated enthusiasm to bring it to the next phase.

The impression that I am getting from this thread is that there is more of a back story than I know and that part of the backstory is frustration with getting OHDSI tools installed and functioning as advertised, especially for folks who are relatively new to the tool stack.

As a starting point, it would be great if the community could work together to bring FRESH EYES to critically evaluate and test all the tools in their current state and give them a grade on several axes that would reflect their maturity such as reliability, usability, stability, relevance, and interoperability. The work could turn into a report that explains the rationale for the grades not just give a number. The report would also make recommendations. With this review in hand, the community could invest a substantial amount of time to review current status then and then prioritize solutions to the biggest pain points in a holistic way. These discussions could also be a spring board to defining new processes based on fully informed analysis of the present state.

I don’t think above is going to be easy and it will take resource away from other activities until it is complete, but I think it will stimulate the needed dialogue and subsequent maturation.

There are myriad challenges that lead those deeply engaged in development work to get sidetracked from investing the time to make tools completely documented, accessible and usable and ensure that things don’t break. Development work is incredibly nuanced and time consuming, especially when it is done to be robust and interoperable with an ecosystem of other simultaneously changing data models and tools etc. There is only so much time and there is continual pressure to add new functionality rather than taking time to refine the same package to make it completely robust, tested and documented. One general principle is a holistic architecture recognizing all the interdependencies. @krfeeney absolutely agree on shared detailed overarching Roadmap so everyone has a way to navigate. @gregk , I think my thoughts are in the same spirit as some of your suggestions, but with much less architectural formalism invoked. Taking things a step further, would it be outrageous to suggest that all new development activities (at least things on a main branch) might need to be put on pause while review and plans across all the tools are worked out?

Regarding the point brought up by @Patrick_Ryan about transparency not being sufficient to engender trust, I do wonder if there are different sorts of transparency? I agree simply having all code and parameters available is not sufficient. Strictly following certain development processes will certainly increase trust among the more technically inclined stakeholders. A cultural transparency that broadcasts a strong awareness of limitations and areas of improvement might start to increase trust more broadly as well.

Final thought: even though I have worked with a team including OHDSI leaders to run a complete PLE study within the OHDSI framework using all the tools etc, it was a difficult process and I found that I was doing a lot of reverse engineering to figure out what was happening under the hood. I don’t think this forum posting is the right place for more details on this, but I would be happy to describe some of the challenges I faced in another setting.

pavgra · May 7, 2019, 2:44am

@Patrick_Ryan, as we discussed, the core of my vision is that OHDSI should develop as organization, as mechanism. It should not be highly dependent on certain people (of course, it is not possible in absolute but that should be the direction). People come and people leave. But OHDSI should stay and shouldn’t ruin if someone leaves. So the most important part for me is rules and standards. Those can lead OHDSI to being self-sufficient and those can resolve conflict situations. The same as contracts do: you don’t sign a contract for the case when everything is good, you sign it to define how tricky situations should be resolved and you sign those to setup expectations.

Therefore, I would propose to start with rules for contributors, owners and contributing process. I see it in the form described below.

Rules for contributing. PR should include (but here we need to distinct bug fixes and features):

Description of fix / feature (min. 1 paragraph)
Diagrams (UML component diagram, class diagram, etc)
Test-cases being described in human-readable way
Auto-tests coded
Required time gap before a proposal and code implementation coming in should be satisfied

Rules for owners:

There should be a defined timeline for responding to PRs / proposals
PR review should be done in formal way: comments, logged discussions, etc

Judging institution:

Who and how and in what timelines makes judgement and decides to approve PR or not in case of disagreement of the contributor and owner

Going forward, the technical architecture should allow extensibility and interchangeability of components. If we have well-defined interfaces, standards, anyone who doesn’t like existing implementation can come, code his own module and use it instead of an existing one without maintaining a fork and duplicating code. A good step in that direction is done by StandardizedAnalysisAPI and pluggable architecture for Atlas 3.0.

So as an action item I would propose to formalize the rules, put them into a separate repo (which also should include coding conventions, code styles, etc) and put a link into repos so that people know how to contribute and what to expect.

pbiondich · May 7, 2019, 2:42pm

Thanks, I missed the point.

I’m not sure that there’s “free lunch” when it comes to the costs of coordination.

Issues like you’ve described in your example most often come down to simple misalignments of understanding or expectation between people that only get sorted with better communication and coordination. As the community grows, more and more work will be needed here. This coordination cost that corresponds with greater interest in what the community is producing creates the “tragedy of the commons” community resource sustainability challenge that you’ll have to grapple with just like every other successful open community. We’ll be there in solidarity together.

As everyone is pointing out, every project needs some way to establish and maintain it’s own compass/north star. The most straightforward way is to have 1 (or sometimes more) individuals responsible for maintaining this. Over time, as each project’s scope and purpose become more clear and universally accepted, you can embed these standards as rules or processes that don’t require benevolent dictatorship to maintain.

The things we all need to anticipate before OHDSI goes down this pathway are where the “project owner” model is problematic or can create new problems if we’re not careful:

Being clear on what behaviors characterize a successful “project owner” (ie, consensus builder vs. dictator - or - top contributor vs. top visionary thinker as it relates to the project) → not being clear on this can create all kinds of misaligned expectations
Discriminating the role of an owner vs. contributors… what influence do each have?
Ambiguous succession model for this role

Others in this thread have touched on these above. I in particular appreciated @krfeeney’s comments and sentiments about doing some of this planning face to face. I could certainly lay some kind of revisions of the model out for consideration (and I’ve alluded to some ideas in my comments in this thread), but at the end of the day, it’s not about what I want. It’s about what we want together. That’s the fundamental evolution I see happening here… and that’s exciting. People like @Patrick_Ryan and @hripcsa will need to coalesce these thoughts, iterate the model, and then try it with the community’s approval.

jliddil1 · May 7, 2019, 4:21pm

So I agree with the fresh eyes. As someone relatively new and largely on my own to implement OHDSI in a small company it is a challenge to absorb all the info. One really needs to avoid the urge to just go head long into implementation. It really helped me to sit and read, watch all the video and repeat. Then read the forums. Repeat.

I come from working on CDISC and they have the same growing/organizational pains. Some pharma companies participate more then others and get their “agenda” pushed forward. I was fortunate to have a manger who fully supported my participation. I came into my current job and had OHDSI kind of forced on me and I had to become the SME. A “development” opportunity. It was a bit tough to come to grips with the shortcomings around oncology and genomics. Again CDISC has these same issues. Of course CDISC has dedicated resources as well.