Over the last few weeks I have been exploring OHDSI CDM and associated tools, and comparing it with Computational Healthcare (a software stack for AHRQ HCUP data developed by me.) . From my initial exploration, its apparent to me that OHDSI CDM might benefit significantly from a serialization format in Interface Definition Language such as Protocol Buffers or Thrift or Avro.
Such format would allow information about Persons, Visits, Condition occurences etc. to be processed in chunks, while maintaining the logical integrity in any programming language (Python, C++, Java) outside a database. Further it would enable creation of Server/Clients for Remote Procedure Calls using frameworks such as gRPC and efficient processing on Spark/Hadoop using columnar data formats such as Apache Parquet.
I am currently trying to translate the CDM specifications into protobuf using the SynPUF. Its very much work in progress but you can take a look here:
https://github.com/AKSHAYUBHAT/ComputationalHealthcare/blob/master/blog/ohdsi.proto
A good introduction to serialization frameworks:
http://ganges.usc.edu/pgroupW/images/a/a9/Serializarion_Framework.pdf