Interestingly, AETION appears to have a very different take on validity. They do not mention the more software-engineering oriented approach to validity, but instead focus solely on validity as the ability to reproduce known study results. They show they have reproduced existing observational studies (not sure why those should be considered a gold standard), and claim to have 'predicted' the result of an ongoing RCT.
I understand why they stress this aspect of validity, and it also seems to argue we definitely should keep the results of our method evaluation against the Method Benchmark in our document. We probably even want to extend it to PLP.