Now, I’m almost done with importing 233 M patients data on Postgresql by finding these data on Github. One more table left to import, and I’ve been importing it for 2 days and half.
Also I know that this query is derived from condition_occurrence.
However, and I don’t feel it is working well.
Is it normal situation that delays for two days and half when importing condition_era table?
When you say ‘import’ do you mean you are executing the query that populates the condition_era table fromt he condition_occurrence records, or are you importing some file into the condition_era table?
If it is the condition_era building script, it is absolutely imperative you have proper indexes on the condition_occurrence table. You should find the suggested index in the OMOP CDM repository, however, simply put:
Index on Person_id, condition_concept_id, condition_start_date.
This will make finding the eras for each condition much faster.
As I said, especially condition_era, drug_era two tables are derived from the other tables as drug_exposure or condition_occurrence.
I already put the suggested index in the Github on my database, and have been executing the query. when executing condition_era query, does index apply on the query?
If you applied the index, then it should be using it, there is a way to check using the PostgreSql (pgAdmin) tool, if you execute the statement via Query -> Explain, it should tell you if it’s hitting indexes on tables.
This function is described here: https://makandracards.com/makandra/35969-pgadmin-has-a-graphical-explain-feature
You do have a really big dataset, so it just may need that sort of time, but would be good to know if everything in the env is tuned properly (perhaps it’s a matter of total RAM available).
Yeah, you’re right.
I also wanted to know data processing as how it works on postgresql, and I cleared it up because of your explanation.
(Additionally, My PC has a 64GB RAM now…)
Thank you, Chris!