Software validity and meeting regulatory requirements

schuemie · October 17, 2017, 9:09am

Whenever we perform an observational study, one important consideration is the validity of our analysis software; Does our analysis code do what it is supposed to do? Although we have gone to great lengths to ensure the validity of the OHDSI Methods Library, we haven’t done a very good job of documenting what we did.

Google doc: OHDSI METHODS LIBRARY: REGULATORY COMPLIANCE AND VALIDATION ISSUES

I’ve drafted this document to address possible questions about validity. Although the document is focused on validity in the context of meeting regulatory requirements, the question of validity obviously extends beyonds studies done at the behest of regulators.

I hereby invite any and all comments on this document, including whether or not you think such a document is a good idea to begin with.

(@msuchard: as the other half of the Estimation Methods Workgroup leadership I’m especially looking forward to your feedback)

msuchard · October 17, 2017, 11:18am

This is a tremendous effort, Martijn … wonderful!

To kick-off the discussion, I’ll lead with one comment. The document describes very well the Method Library’s current capabilities (CohortMethod, SCCS, etc. written in R, C++, SQL and Java), but does not discuss the introduction of future tools and languages.

A question for the community: to what procedures should we adhere when introducing new tools and languages?

keesvanbochove · October 19, 2017, 6:37pm

Hello Martijn,

This is a great effort indeed. The scope of the document is currently the validity of the analysis through the Methods Library in the context of GxP regulatory requirements. This seems like a good place to start as it is tied in the most with potential regulatory submissions. I think this document is a very helpful starting point for any organization (pharma company or university/hospital) that intends to use OHDSI methods for studies that would fall under this type of compliance. One question that I was wondering about, to what extent / how do you address the validity of the source data in these kinds of studies?

In addition, yesterday in the meeting a number of other topics were brought up as potential topics to take a look at including the legal status of the code. We have been through some very similar discussions and legal consults in a.o. the tranSMART community and I’d like to define a set of actions that we can take to help organizations overcome these hurdles as much as possible. As Pia Mancini from OpenCollective eloquently put it, “Asking our generation to create a command and control hierarchical entity in order to operate is like asking, “Who’s the president of the internet?”” Yet we live in this world under its current laws and organizational boundaries, and that’s the reality we have to make OHDSI work in to fulfill our mission. I will try to raise this topic tomorrow in the Architecture workshop if it fits in the agenda, and see who else would like to work on this.

Greetings,

Kees

schuemie · October 23, 2017, 8:51am

Thanks Kees!

Source data validity is indeed another interesting topic. There is of course @Vojtech_Huser 's paper that makes a good first attempt to address this, and in general chart review is seen as a way to show source data validity (although not very convincing to me, for one because it typically only addresses the false positive rate, but not for example (differential) sensitivity).

Could you elaborate a bit more on what you mean by ‘the legal status of the code’? Do you mean who owns the code? Or who can make decisions about the code?

schuemie · October 23, 2017, 8:58am

Just to have it on everyone’s agenda: we will discuss the topic of software validity at the next meeting of the population-level estimation workgroup (Thursday October 26 at noon Eastern Time, 6pm Central European time, 9am Pacific Time).

@Rijnbeek, @jennareps: maybe you could join to discuss how this applies to the patient-level prediction tools as well?

Christian_Reich · October 23, 2017, 9:53am

Friends:

I would like to rain on your parade here a little bit. Computerized System Validation, as it is called by the Regulations, pertains to activities that are regulated by Title 21 of the Federal Regulations under the Food, Drug and Cosmetic Act. Usually it is about records of Clinical Trials, consent forms, qualifying investigator sites, analytical records of producing the drug substance, and quality systems in any of the other processes the results of which the Regulations want you to submit to get marketing approval. In particular, Part 11 explicitly regulates how you create, modify, maintain, or transmit electronic records that are submitted as part of the Regulation in lieu of paper records, and electronic signatures that you use to control these records in lieu of paper signatures. Read the regulation, it is pretty clear (and I have been in that business in my prior lives for too a long time, I can sing this in my sleep).

No observational research is regulated by the Regulation. The creation, modification, maintenance and transmission of the records (databases) are done by entities not regulated by the FDA (EMR vendors, insurance companies) for purposes not regulated by the FDA. If you want to use them as records for Clinical Trials - that’s a different situation. But we are not talking about that. There are also no electronic signatures whatsoever. So, observational research has the same status as general drug research - not covered.

So, I am not saying we should not validate the system in the sense of having a robust QA which tests all aspects of what we put out and properly document that. But let’s keep Part 11 and all that stuff out. We would never be able to do it for the above reasons anyway, because we don’t control the records.

I would declare this exact thing in the document Martijn started and will end up on the website somewhere. Let me know, and I’ll do it.

schuemie · October 23, 2017, 11:35am

Hi Christian,

You are right in that observational studies are currently not regulated with respect to the validity of the analysis software. However, that does not stop regulators from asking for artifacts supporting said validity (and recently they do this more and more). The document I drafted is intended to be such an artifact, and in absence of regulations specific to observational studies I propose we take the regulations specific to RCTs as a starting point. Also, in general it is good to think about software validity in a more structured way, even outside of any regulatory requirements.

You are also right that Part 11 does not precisely apply to observational studies, and this is apparent everywhere in the document where it says

The Methods Library is not intended to create, maintain, modify or delete Part 11 relevant records but to perform population-level effect estimation, including relevant output necessary for understanding the estimate and the estimation characteristics.

Quite honestly I simply copied this part from this document by the R foundation. We do create some records, mostly meta-data about the execution of the study and the actual results, and the document therefore simply checks whether we would meet regulatory requirements if those included Part 11.

gregk · October 23, 2017, 11:48am

there is a document that was issued by FDA just recently on the use of RWE to support regulatory decisions that is worth reading. It does cover the “reliability” and quality of the RWD as well.

schuemie · October 23, 2017, 12:10pm

Thanks Greg!

Yes, that document is also relevant, although in it they almost exclusively deal with reliability of the data, not the software used to analyse the data.

Another document to mention is Best Practices for Conducting and Reporting Pharmacoepidemiologic Safety Studies Using Electronic Healthcare Data. They have one short paragraph on validity of the analysis:

H. Procedures To Ensure Accuracy of Data Management and Analysis Process

The 2008 ISPE guidelines highlight the importance of describing “data management and statistical software programs and hardware to be used in the study” and “data preparation and analytical procedures as well as the methods for data retrieval and collection” (ISPE guidelines 2008). FDA encourages investigators to describe these processes used for managing and preparing their data. It is important that analysts performing and reviewing data management and analysis have appropriate training or prior experience in the use of the particular analytic software that is being used. FDA considers documentation a very important component of the analytic process and recommends that all analytic programs be thoroughly annotated with comments that clearly describe the intent or purpose of each step.

Both this paragraph and the ISPE guidelines they reference stress the need to document what was done. I agree with the need to document and even believe we should go further and always provide the analysis code as open source to allow full replication of what was done. However, I believe validity goes far beyond documentation.

Christian_Reich · October 23, 2017, 1:10pm

@schuemie

Agree with you. Validation according to FDA and other sources is “documented proof that the system works according to pre-defined specifications”. Documentation of what it does is not enough, but of course is all most folks in the area are capable of doing, and hence it goes that way into the “guidelines”.

How about this: We explain how we see validation, how GCP does not apply (which is important, because folks keep bringing it up, mostly due to ignorance) and what we do. Then we do it. Happy to write the intro if you like.

schuemie · October 23, 2017, 1:46pm

@Christian_Reich: Yes, I agree with your points, and yes, please draft an introduction.

keesvanbochove · October 24, 2017, 10:47am

Hi all,

I agree with the other comments about applicability of Part 11.

Just wanted to clarify my comment on ‘legal status of the code’. In principle, the code base seems to be licensed under Apache 2.0. However, several files have copyright statements of the form ‘Copyright Observational Health Data Sciences and Informatics’. Since OHDSI isn’t actually a legal entity or copyright collective as far as I know, the meaning of this sentence is unclear. In many cases, this is not a problem as the actual authors are also listed further down in the header using an ‘author’ tag. Clarity on the authors / rights holders is important so you can verify (and even prove, for github commits) that they actually had the rights to grant the license on that code. There’s also the issue of (especially frontend) libraries which might have various licenses which might need to be checked for compatibility with Apache 2.0. And finally there’s the provenance of the OMOP model itself. It seems that prior to version 4.0 it was licensed by FNIH under Apache 2.0, with subsequent additions licensed under CC0.

It all looks good already but it might be an idea to do a comprehensive license analysis on the code and publish that, to alleviate any concerns as for example voiced by @Gowtham_Rao in the stakeholder forum last week. Recently we undertook a very similar effort at The Hyve with tranSMART codebases as part of the tranSMART 17.1 project, but it takes some time to sort this out, so ideally I’d find some funding for that or some other way to prioritize it as part of our ongoing projects.

Greetings,

Kees

Rijnbeek · October 26, 2017, 11:35am

Hi All,

Nice discussion and Martijn thanks for drafting this.

The current document focusses on the methods library (i think this should include PLP). Do we foresee more documents as this describing other parts of the full process? For example, reproducibility of previous studies also depends heavily on the vocab version archiving etc etc. Personally, I am more ‘worried’ about all the decisions made before and after our methods, but I may be completely wrong about this.

I understood the focus on the current FDA regulation, but I would also be interested to get some European input about software validity requirements. We do have good ideas on this side of the planet For example, I know that in an upcoming change in regulations prediction models will be seen as a “device” and therefore would fall under other regulatory pressure. I also know that EMA has a 'Qualification of Novel Methodologies’ procedure to get new tools certified and we will explore this together with NICE (https://www.nice.org.uk) who have a lot of experience in the area. I will ask them if there are specific requirements for software as we use it, or if there are ongoing initiatives to make these. Ideally, we would like to see OMOP-CDM and analytical tools certified for post-authorsation safety and efficacy studies by going through such a procedure at some point.

I will join the meeting today.

Peter

Christian_Reich · November 19, 2017, 9:40am

Friends:

Take a look. I rewrote the Intro to cover how computer systems for RCT are regulated, and how observational studies are not and why not.

What we now need is a much more robust discussion of how we want to ensure quality, in particular in statistical methods. It’s not easy there, since it is hard to create test cases and the non-deterministic behavior of some of the employed methods are at odds with how QA usually is done. I am sure there is literature about it, and we need to refer to that as well. I wouldn’t be the best author of that.

Let me know what you think.

Not sure how the Europeans do it, but the FDA will only make you fall under the Quality System Regulation of medical devices if your software detects or manages a disease in a patient. The estimation methods should be fine, patient-level prediction could be an entirely different matter.

schuemie · March 20, 2018, 8:35am

This had been lingering on my to do list for a while now. I finally got around to drafting a new version, based heavily on @Christian_Reich’s input.

Could I ask everyone to review this new version? I would especially like to invite @msuchard, @Rijnbeek, and @jennareps, who are mentioned by name in the document.

hripcsa · March 28, 2018, 7:38pm

Two questions on the wonderful and well-written software validity document.

Should we really consider calibration to be a software validity issue? (Requirements for the methods library section.) That’s analogous to saying don’t use case-control any more. Perhaps right, but not in this document. Goes more in a document about proper methods. Especially given that the assumptions about positive controls are not a slam dunk.
Section 6 on testing seems critically important and perhaps worth more details.

Also you say observational data contain a "small amount of inaccuracies.” We wish so. Maybe not so small.

schuemie · March 29, 2018, 5:02am

I think by ‘calibration’ you mean the empirical method evaluation? I’m a bit on the fence about whether it should be included.

On the one hand, if you’re going to fly in a plane you would appreciate it if it wasn’t just that the plane was built by an ISO-9000-compliant company, but also that someone made a test-flight in the plane before you get on board. So running the Method Library through the Methods Benchmark is informative on the validity of the software. Imagine that a method just produces the same estimate for all controls in the Benchmark, that would be considered invalidating the software. Getting the right answer somehow feels like it should be part of our definition of validity.

On the other hand there are the inherent strength and weaknesses of the methods, that are independent of whether they have been implemented correctly. Even the best implementation of case-control will get the answer wrong most of the time, so that should not count towards bad software validity, only bad method validity.

What do other people think?

On expanding section 6: I agree, but I’ve already tried to be as long-winded as I can. Any suggestions on how to be more verbose?

Christian_Reich · March 30, 2018, 11:40am

I think you hit the nail on the head, @schuemie. This is the crux, here. How about this:

We declare that a clinically correct result currently is not possible to achieve. OMOP showed the massive heterogeneity of results depending on a number of parameters and design choices. Currently, we have no way of setting these parameters objectively. More research is necessary. So:

Validation is promising that the result is computationally correct.
Validation is not promising that the result is clinically correct. More research is necessary, and it is not part of this paper.

Rijnbeek · May 20, 2018, 7:50pm

Hi martijn,

Sorry for the delay it has been on my todo list for a while but a week has only 7 days…

Find some discussion points in this document: .MethodsLibraryValidity_pr.docx (1.0 MB)

A big question for me is: How do the big software companies that develop analytical software such as SAS, SPSS, STATA guarantee validity? Can we learn anything from them?

Another interesting development that as of very recently, FDA is approving predicition algortihms such as:

I know that regulatory requirements for predicition algorithms and AI in general will be enforced more and more in the upcoming years and we have put this as a task in the European Health Data and Evidence Network (EHDEN) project to look into. I like to understand what was needed for FDA to accept this example.

NICE ( https://www.nice.org.uk) will be involved in this (and we hope to involve the EMA if possible). They will also be involved in assessing the full OHDSI pipeline from a regulatory perspective in one of EHDEN’s deliverables.

Finally, the document you have drafted is very important to move forward because we need to have a fully worked out and broadly supported validation framework for all parts of the analytical pipeline (CDM->study results) to gain the trust of the community including the regulatory bodies. I am convinced we can get there!

To be continued.

schuemie · May 22, 2018, 11:53am

Thanks @Rijnbeek!

I found this interesting white paper detailing SAS’s approach to software validity. I haven’t yet read it in detail, but it seems to follow the same broad outline we have in our document (but with lots more text). They too emphasize unit test (claiming to have ‘275,000 unique tests’!).

I’ll respond to your comments in detail later.