I know it could better asked on stackoverflow but it seems it is more friendly here and I get more information related to OMOP.
There are quite some 800 files and let’s say 1 million lines in each text file, which is actually fixed width csv.
How do you process data at that size?
I know a bit of programming but I am terrible at C++ (and probably it is not worth my time to learn the whole thing but to wait for Python to run overnight).
I used very basic vanilla Python packages. Those are open(… ,“r”), loop each line and edit, then write the lines into a new .csv. The program is very slooooooow, it likely takes 6 hours for a run (and who know I get an error in the morning).
And every line has to be edited of course, mapping to OMOP, before feeding into SQL.
So the real problem is, what kind of Python package you use to read and manipulate file at that big size?
Or can I feed it into SQL first and then manipulate from there, would it be a lot faster? Since it is on a HPC.
Thanks for helping a programming noob.