Toward Specifications

Frank · October 31, 2017, 3:18pm

There has been a lot of discussion in the past few weeks regarding the lack of published specifications. It is true that there is limited documentation and no formal specifications for a number of elements of the architecture including Concept Sets and Cohort Definitions. In attempting to fill this gap I have done some work to create a specification for Concept Sets using the Open API 3.0 specification.

I would like to share this draft specification with the community for feedback on both the process of using Open API 3.0 as a formal method for documenting and publishing specifications as well as the content of the specification itself. This is a draft specification. It is not compliant with the current concept set API or intended to be documentation for the current API in this form.

https://app.swaggerhub.com/apis/fdefalco/org.ohdsi.ConceptSets/1.0.0

Chris_Knoll · October 31, 2017, 7:05pm

Since we use Spring Boot, there are hooks to swagger:

I’m curious about the impression that we don’t have a published specification. Why isn’t the published releases in git considered a ‘published specification’?

Frank · October 31, 2017, 7:38pm

That guide does describe how to use Swagger 2 to generate REST API specifications for a Spring Boot project. I do not believe we currently use that anywhere in the WebAPI. If we did, I would agree that we would have a means to generate the Swagger 2 specification from the WebAPI implementation and that document would be a specification that we could publish.

I do not view the published releases of source code on github to be a specification, but I agree that there are ways to generate specification documents from that code if we follow something like the guide you linked above.

I would like to be able to draft, iterate, and design a specification that could be shared before an implementation exists. We could do that in a number of ways, including as part of source code development. My current request is to evaluate using tools like Swagger Editor / UI to develop specifications before a language specific implementation is started. From the Open API specification repo:

The OpenAPI Specification (OAS) defines a standard, programming language-agnostic interface description for REST APIs, which allows both humans and computers to discover and understand the capabilities of a service without requiring access to source code, additional documentation, or inspection of network traffic. When properly defined via OpenAPI, a consumer can understand and interact with the remote service with a minimal amount of implementation logic. Similar to what interface descriptions have done for lower-level programming, the OpenAPI Specification removes guesswork in calling a service.

Chris_Knoll · October 31, 2017, 10:01pm

I would view published releases of source code a specification, tho admittedly not a friendly way to navigate the specs. We used to have javadoc generation, not sure where that went, but I think that would suffice as well. One concern with the draft, iterate, design cycle you describe is round-tripping from the implementation. If that’s not supported, you are left keeping the specification in sync with our implementation which could lead to errors and is cumbersome to maintain. So, whatever the tool is that you identify that you want to adopt, I’d make sure that you can update an existing implementation with the tools so that we aren’t left to manually reconcile the implementation to specification.

I think in the case of WebAPI, we’re dealing with an existing implementation, so something that could work off existing source code would be preferred.

schuemie · November 1, 2017, 9:28am

Here are my two cents, mostly to demonstrate my ignorance:

I found this article to be somewhat helpful. It explains:

Documentation: API Documentation describes, with examples, how an API functions, and how to call those functions
Specification: API Specification details the functional and expected behavior of an API, as well as the fundamental design philosophy and supported data types.
Definition: API definitions define the backbone, organization, and function of an API at a base-machine readable level. API Definitions are also unique in that they provide a base starting point for derivations into other platforms.

Even after reading this, the distinction between documentation and specification is not very clear to me. In R, I’ve never seen package specifications but there tends to be a lot of package documentation (manuals + vignettes). The example specifications that Frank posted to me look a lot like the API documentation.

I fully agree we need proper documentation, so users know how to use our tools.

Am I right in thinking that specifications (especially for existing software) are intended to guide (future) developers of the tools?

Chris_Knoll · November 1, 2017, 2:00pm

Hey, @schuemie,
I like the way you and the article breaks out the different realms of documentation, specification and definition. While reading through the article, they referenced the swagger specification document here, and their specification document doesn’t seem that it was built out of the swagger framework, rather it was just a document that followed the principles that you described in the article.

So, on the topic of specification, I would say that specification can be materialized not only in a document describing expected behavior and design philosophy, but also a collection of test cases that is used to verify the specification is being followed. For example this repo has a ‘spec’ folder which contains all the test cases that ensure the different elements of the library adhere to specific behaviors. Very similar to the test cases provided in SqlRender.

One difference between documentation and specification I think is the level of implementation. If the documentation contains examples, then there’s some assumption about the implementation: documentation about an R package shows examples of calling code using R. Documentation for a Java library would show Java syntax. Likewise for Javascript libraries. However, if we want to abstract above that, at the specification level, the implementation fades into the background and instead you describe behavior, inputs and results. The results could be materialized as a dataframe in an R context, ResultSet in Java, JSON in javascript, but that’s for the person taking the specification to create the implementation.

So, one question I have is, if we’re taking about putting together official specification, is the objective to provide the blueprints to make an implementation in an arbitrary technology stack?

CRoeder · November 1, 2017, 2:01pm

The first two seem like the older style of tutorial and reference books like Learning Perl and Programming Perl, and the C++ Primer and The C++ Programming Language. You use the first to get started and learn the general concepts. You dig into the second for deep detail while working on an application. I haven’t written C++ for 20 years, but The C++ Programming Language is the most worn book on my shelf.

To my eye, those links look like Specification.

Frank · November 1, 2017, 2:06pm

Yes, specifications can be used to guide developers that will make use of the API themselves. Additionally, the specifications can be used to develop an implementation of the specification. This is where I’m suggesting a subtle difference.

You can write the API in Java, as we have done for the current WebAPI and then generate documentation from it using automated tools, and this works well. However, its based on an existing implementation, meaning that you’ve written the code based on a general understanding of what needs to be developed and then documented what it is that you have developed.

Alternatively, before you begin developing an implementation, you can design a specification that can be shared with the community and that design can be collaborated on until there is a consensus of what should be developed. In the case of using OpenAPI 3.0 (essentially, the next version of what was formerly known as Swagger) you can take that specification document (serialized as either YAML or JSON) and generate code in any of a dozen or so languages that you could then use for your implementation.

When there were very few developers working on OHDSI software it didn’t matter as much to have a specification available as we were creating the reference implementation and releasing it without much contribution from others. Today we have a much more vibrant community of developers many of which are interested in writing their own implementation of tools to develop things like Concept Sets and Cohort Definitions. To do so some have been forced to reverse engineer designs out of calls to the WebAPI to see what it is that it returns and then create implementations in their preferred development paradigm to mimic that behavior.

Developing a stubbed (incomplete) implementation and and documenting it could in fact work as a means for sharing a specification. It definitely is a subtle difference and I am not trying to imply otherwise, however I do prefer the approach of publishing a pure implementation agnostic specification.

To make a rough analogy, the W3C doesn’t develop a web browser that renders web pages they way they think it should and the document that approach and share it with the world. Instead they develop standards and recommendations which others can then adopt as part of their implementations.

In response to @Chris_Knoll’s question:

So, one question I have is, if we’re taking about putting together official specification, is the objective to provide the blueprints to make an implementation in an arbitrary technology stack?

Yes, this would be one of the objectives. As a use case, there is interest in document based repositories for Concept Sets and Cohort Definitions, instead of requiring they use relational databases for storage. A specification for the Concept Set API would allow someone to implement it however they see fit for their technology stack.

anthonysena · November 1, 2017, 8:36pm

I personally like this approach that @Frank has laid out with OpenAPI 3.0 so that we can iterate over how something works before we build it and we then have a record of the design. I also agree with @Chris_Knoll that this should fit into our development activities so that we can keep both the specification and documentation synchronized.

One other consideration for this discussion: who is the intended audience for these design artifacts? It has been my experience that the design specification is born out of answering the question: “What problem are we trying to solve?”. It is a collaboration among stakeholders to describe the problem, terminology around the problem and ultimately an approach to solving the problem. The technical documentation expands on this document by answering the next question: “Here is how we implemented a technical solution that solved the problem.”

To that end, @Frank, looking at what you have in the Concept Set specification: https://app.swaggerhub.com/apis/fdefalco/org.ohdsi.ConceptSets/1.0.0, I like the definition that you’ve provided for a concept set:

Concept Sets are collections of concept based logic rules that help define sets of terminology to be used in research activities.

I think it would be useful to enumerate some of the use cases that are addressed by Concept Sets so we can clearly state what problems are solved by this construct. Here is a first pass:

Provides the ability to group concepts to describe clinical information in patient populations. A concept belonging to a concept set can utilize any of the following attributes:
- A concept may include itself by virtue of it being part of the concept set group.
- A concept may exclude itself by virtue of it being part of the concept set group AND having a flag indicating to exclude it.
- A concept may include/exclude itself and descendant concepts based on the standardized vocabularies using the ‘includeDescendants’ flag.
- A concept may include/exclude itself and mapped concepts based on the standardized vocabularies using the ‘includeMapped’ flag.
Concept sets can leverage the standardized vocabularies to fully self-describe its contents and all concepts from source vocabularies.

I’m sure there are other use case but putting this out there to start. My hope is that these use cases can provide a person with background on ‘what is a concept set and what problems does it address’. This is then further detailed in the WebAPI documentation to describe usage and implementation.