OHDSI Home | Forums | Wiki | Github

How do you build a 100TB SQL database for OHDSI?

Hello guys! I am a somewhat green developer but had been in the medicine field quite some time. A lady gave me a job of building a SQL (and the relating infrastructures and install the software) for ODHSI with 100M entries. I do not know the details of the database, but let’s assume each entry is of 1MB and we need to build a 100TB database?

I am mildly interested in this problem. So I am doing a high level discovery. How do you build a SQL database of 100TB?

And that she had clearly state that she is not interested in using cloud services (which should be very easy if opted to), since she does not want any ‘accidents’ despite most data is de-identified.

Can anybody show me a google keyword on building these big data SQL or give me a tutorial video link?

I have lately discovered the storage options of NAS and DAS which could be easily a few hundred TB. So using those raids, is it as easy as somehow using a computer hit install follow guides and everything would work wonder?

Thanks! Have a nice weekend

Hi, It won’t be 100Tb

I have a number of databases - one in particular with 2 million patients and 500million lines of clinical data is only 500Mb. You will need space for the CDM database and then a working database where you transform your data into the correct format. The main constraints are unlikely to be disk space but ram and cpu.

ALL my database sit on SQL server 2016 with 1.5Tb storage - for everything.

Hi @lychenus

I suggest you look at Amazon redshift, Google Bigquery among other high-performing ones.

Best,

Jose

Hi John,

Thank you very much. I feel stupid now, haha.

Just out of curious again, as a high-level question, how do you mitigate issues with the ram or CPU when one is trying to transform data in the database at such size? Is there anything like gradient descent, or I am worrying too much it is only a few clicks in the GUI?

Thanks

t