Software validity and meeting regulatory requirements

Christian_Reich · October 23, 2017, 1:10pm

Agree with you. Validation according to FDA and other sources is “documented proof that the system works according to pre-defined specifications”. Documentation of what it does is not enough, but of course is all most folks in the area are capable of doing, and hence it goes that way into the “guidelines”.

How about this: We explain how we see validation, how GCP does not apply (which is important, because folks keep bringing it up, mostly due to ignorance) and what we do. Then we do it. Happy to write the intro if you like.

schuemie · October 23, 2017, 1:46pm

@Christian_Reich: Yes, I agree with your points, and yes, please draft an introduction.

keesvanbochove · October 24, 2017, 10:47am

Hi all,

I agree with the other comments about applicability of Part 11.

Just wanted to clarify my comment on ‘legal status of the code’. In principle, the code base seems to be licensed under Apache 2.0. However, several files have copyright statements of the form ‘Copyright Observational Health Data Sciences and Informatics’. Since OHDSI isn’t actually a legal entity or copyright collective as far as I know, the meaning of this sentence is unclear. In many cases, this is not a problem as the actual authors are also listed further down in the header using an ‘author’ tag. Clarity on the authors / rights holders is important so you can verify (and even prove, for github commits) that they actually had the rights to grant the license on that code. There’s also the issue of (especially frontend) libraries which might have various licenses which might need to be checked for compatibility with Apache 2.0. And finally there’s the provenance of the OMOP model itself. It seems that prior to version 4.0 it was licensed by FNIH under Apache 2.0, with subsequent additions licensed under CC0.

It all looks good already but it might be an idea to do a comprehensive license analysis on the code and publish that, to alleviate any concerns as for example voiced by @Gowtham_Rao in the stakeholder forum last week. Recently we undertook a very similar effort at The Hyve with tranSMART codebases as part of the tranSMART 17.1 project, but it takes some time to sort this out, so ideally I’d find some funding for that or some other way to prioritize it as part of our ongoing projects.

Greetings,

Kees

Rijnbeek · October 26, 2017, 11:35am

Hi All,

Nice discussion and Martijn thanks for drafting this.

The current document focusses on the methods library (i think this should include PLP). Do we foresee more documents as this describing other parts of the full process? For example, reproducibility of previous studies also depends heavily on the vocab version archiving etc etc. Personally, I am more ‘worried’ about all the decisions made before and after our methods, but I may be completely wrong about this.

I understood the focus on the current FDA regulation, but I would also be interested to get some European input about software validity requirements. We do have good ideas on this side of the planet For example, I know that in an upcoming change in regulations prediction models will be seen as a “device” and therefore would fall under other regulatory pressure. I also know that EMA has a 'Qualification of Novel Methodologies’ procedure to get new tools certified and we will explore this together with NICE (https://www.nice.org.uk) who have a lot of experience in the area. I will ask them if there are specific requirements for software as we use it, or if there are ongoing initiatives to make these. Ideally, we would like to see OMOP-CDM and analytical tools certified for post-authorsation safety and efficacy studies by going through such a procedure at some point.

I will join the meeting today.

Peter

Christian_Reich · November 19, 2017, 9:40am

Friends:

Take a look. I rewrote the Intro to cover how computer systems for RCT are regulated, and how observational studies are not and why not.

What we now need is a much more robust discussion of how we want to ensure quality, in particular in statistical methods. It’s not easy there, since it is hard to create test cases and the non-deterministic behavior of some of the employed methods are at odds with how QA usually is done. I am sure there is literature about it, and we need to refer to that as well. I wouldn’t be the best author of that.

Let me know what you think.

Not sure how the Europeans do it, but the FDA will only make you fall under the Quality System Regulation of medical devices if your software detects or manages a disease in a patient. The estimation methods should be fine, patient-level prediction could be an entirely different matter.

schuemie · March 20, 2018, 8:35am

This had been lingering on my to do list for a while now. I finally got around to drafting a new version, based heavily on @Christian_Reich’s input.

Could I ask everyone to review this new version? I would especially like to invite @msuchard, @Rijnbeek, and @jennareps, who are mentioned by name in the document.

hripcsa · March 28, 2018, 7:38pm

Two questions on the wonderful and well-written software validity document.

Should we really consider calibration to be a software validity issue? (Requirements for the methods library section.) That’s analogous to saying don’t use case-control any more. Perhaps right, but not in this document. Goes more in a document about proper methods. Especially given that the assumptions about positive controls are not a slam dunk.
Section 6 on testing seems critically important and perhaps worth more details.

Also you say observational data contain a "small amount of inaccuracies.” We wish so. Maybe not so small.

schuemie · March 29, 2018, 5:02am

I think by ‘calibration’ you mean the empirical method evaluation? I’m a bit on the fence about whether it should be included.

On the one hand, if you’re going to fly in a plane you would appreciate it if it wasn’t just that the plane was built by an ISO-9000-compliant company, but also that someone made a test-flight in the plane before you get on board. So running the Method Library through the Methods Benchmark is informative on the validity of the software. Imagine that a method just produces the same estimate for all controls in the Benchmark, that would be considered invalidating the software. Getting the right answer somehow feels like it should be part of our definition of validity.

On the other hand there are the inherent strength and weaknesses of the methods, that are independent of whether they have been implemented correctly. Even the best implementation of case-control will get the answer wrong most of the time, so that should not count towards bad software validity, only bad method validity.

What do other people think?

On expanding section 6: I agree, but I’ve already tried to be as long-winded as I can. Any suggestions on how to be more verbose?

Christian_Reich · March 30, 2018, 11:40am

I think you hit the nail on the head, @schuemie. This is the crux, here. How about this:

We declare that a clinically correct result currently is not possible to achieve. OMOP showed the massive heterogeneity of results depending on a number of parameters and design choices. Currently, we have no way of setting these parameters objectively. More research is necessary. So:

Validation is promising that the result is computationally correct.
Validation is not promising that the result is clinically correct. More research is necessary, and it is not part of this paper.

Rijnbeek · May 20, 2018, 7:50pm

Hi martijn,

Sorry for the delay it has been on my todo list for a while but a week has only 7 days…

Find some discussion points in this document: .MethodsLibraryValidity_pr.docx (1.0 MB)

A big question for me is: How do the big software companies that develop analytical software such as SAS, SPSS, STATA guarantee validity? Can we learn anything from them?

Another interesting development that as of very recently, FDA is approving predicition algortihms such as:

I know that regulatory requirements for predicition algorithms and AI in general will be enforced more and more in the upcoming years and we have put this as a task in the European Health Data and Evidence Network (EHDEN) project to look into. I like to understand what was needed for FDA to accept this example.

NICE ( https://www.nice.org.uk) will be involved in this (and we hope to involve the EMA if possible). They will also be involved in assessing the full OHDSI pipeline from a regulatory perspective in one of EHDEN’s deliverables.

Finally, the document you have drafted is very important to move forward because we need to have a fully worked out and broadly supported validation framework for all parts of the analytical pipeline (CDM->study results) to gain the trust of the community including the regulatory bodies. I am convinced we can get there!

To be continued.

schuemie · May 22, 2018, 11:53am

Thanks @Rijnbeek!

I found this interesting white paper detailing SAS’s approach to software validity. I haven’t yet read it in detail, but it seems to follow the same broad outline we have in our document (but with lots more text). They too emphasize unit test (claiming to have ‘275,000 unique tests’!).

I’ll respond to your comments in detail later.

Rijnbeek · May 22, 2018, 1:38pm

Yes nice. I wonder if we can have a chat with someone from SAS or other big company to get more details. If we can follow and reference some of this we have a strong point since these tools are accepted by regulators…

schuemie · May 23, 2018, 9:03am

Interestingly, AETION appears to have a very different take on validity. They do not mention the more software-engineering oriented approach to validity, but instead focus solely on validity as the ability to reproduce known study results. They show they have reproduced existing observational studies (not sure why those should be considered a gold standard), and claim to have ‘predicted’ the result of an ongoing RCT.

I understand why they stress this aspect of validity, and it also seems to argue we definitely should keep the results of our method evaluation against the Method Benchmark in our document. We probably even want to extend it to PLP.

Gowtham_Rao · May 23, 2018, 11:07am

Hopefully this is contributing to this conversation about software validity. NCQA has a software certification process for HEDIS quality measures. The certification process is detailed here http://www.ncqa.org/hedis-quality-measurement/data-reporting-services/quality-measure-certification this is an approach I have seen

Andrew · May 23, 2018, 12:10pm

I am aware that the VA has not yet approved OHDSI tools including Atlas and the major analytics packages because they are open source. It might be good to investigate what criteria are part of their approval decision, unless these are already well known.

Andrew · June 5, 2018, 5:18pm

there is a document that was issued by FDA just recently on the use of RWE to support regulatory decisions that is worth reading. It does cover the “reliability” and quality of the RWD as well.

https://www.fda.gov/downloads/medicaldevices/deviceregulationandguidance/guidancedocuments/ucm513027.pdf

Does anyone know what, if anything, this document implies regarding FDA oversight/approval of predictive algorithms used in medical devices?

Andrew · July 18, 2018, 7:15pm

Here is a summary of what I’ve discovered re new FDA regulations for software. Some of this might be useful to update or reference in the OHDSI Software Validity and Regulatory Requirements document.
The FDA’s Center for Devices and Radiological Health has a new program focusing on Digital Health that covers software that it classifies as a medical device: https://www.fda.gov/medicaldevices/digitalhealth/
As of summer 2018 it has issued some guidance on software that is particularly relevant to PLP @Rijnbeek and @jreps. One was developed in collaboration with the Software as a Medical Device Working Group of the International Medical Device Regulators Forum. So it is relevant to OHDSI members outside the US.
Here is a link to that one: https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM524904.pdf
I assume this will replace or supersede the 2002 General Principles of Software Validation document reference in @schuemie 's splendid document which covers some of the same territory.
There is also a draft guidance document that is US-only and focused on clinical decision support software:
https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/UCM587819.pdf

This Guidance on clinical decision support software defines what an algorithm implemented as software is classified as a device and is therefore subject to oversight, evidence requirements for approval, etc.
The criteria for classification of an algorithm as a device hinges on whether it processes information that affects medical decisions in ways the clinician could not do without the algorithm. For example, software that delivers reminders to do something according to a practice guideline don’t meet the definition. Software that takes data and processes it to predict something relevant to a patient’s care do meet the definition. But that’s just my gloss. People should read for themselves rather than relying on my take.

The Software as a Medical Device document also includes sections on clinical and technical validation that might be useful to reflect or mention somewhere in the OHDSI software document.

schuemie · July 20, 2018, 8:31am

Thanks @Andrew! I will thoroughly revise our document before the symposium (hopefully much sooner than that), and will incorporate these new developments you’ve highlighted.

The idea is to have a ‘formal’ first version of the document by the symposium.

Christian_Reich · July 21, 2018, 1:04pm

@Andrew, @schuemie, friends:

Before we stray too far from the path: Apart from this not being a regulation but just a guideline, and it not even being issued by the FDA but IMDRF, it pertains to Software as a Medical Device. And that is defined as software that does one of the following:

diagnosis, prevention, monitoring, treatment or alleviation of disease,
diagnosis, monitoring, treatment, alleviation of or compensation for an injury,
investigation, replacement, modification, or support of the anatomy or of a physiological process,
supporting or sustaining life,
control of conception.

So, patient care. We do none of that. At least not now. When we start pushing our stuff out to individual patients or providers handling individual patients, we may be getting closer to what the intention of these guidelines is. Of course, it’s always good to get good ideas and hints about how to do software validation right, if that’s what you had in mind.

Rijnbeek · July 22, 2018, 12:26pm

Thanks Andrew,

In the EHDEN project I have created a task related to the regulatory requirements for PLP since this is indeed getting in the area of medical devices with the new upcoming regulations. This will also happen in Europe. Thanks for sharing and we will come back to this later once the EHDEN project is started.

Peter