OHDSI Home | Forums | Wiki | Github

Question about importing large OHDSI files into Databricks

Hi Everybody,

Just wondering what folks are using to import the large OHDSI terminology tables into Databricks from the tab delimited text file from Athena (Athena).

We’re currently using the API but its taking a long time for the larger files, hours for concept_relationship (~2G), concept_ancestor (~1.3G).

What sort of times are you seeing to upload these files and what are you using to upload them to Databricks?

I’m not expert on this, as our ETL is handled by a different team in my company, but I’d imagine a multipart S3 upload shouldn’t be too bad for those files, no?

t