This is really exciting work by a team at Columbia. The question is how to guarantee that the data this model generates does not leak any real information about a person? Suppose the model generates a fake patient record that exactly matches a real patient record. It is bound to happen for short sequences of events. There is a concept called differential privacy that could be used.
Roughly, an algorithm is differentially private if an observer seeing its output cannot tell whether a particular individual’s information was used in the computation. Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms provably resist such attacks. -Differential privacy - Wikipedia
So I think if you can show that the model would output the same sequence regardless of if any single individual’s data was included or excluded from the training process then the algorithm is differentially private.
I have no idea how to actually implement this for LLM training but if it is possible we could have provable privacy guarantees.
For those interested check out the work of Thomas Stromer at the University of California who has been working on the theoretical foundations for provable privacy guarantees. Thomas Strohmer - Publications