I’ve recently completed standing up our OHDSIonAWS stack in our AWS account and I opted to use all synthetic datasets that were referenced in the CloudFormation template. Everything works great for those datasets, and I can easily query them and look at the data in Atlas. Now, the organization I work for is going to be generating a ton of data that I’m attempting to stand up an automated data pipeline for, and my question is the following: what is the process of adding new datasets to the Redshift cluster on AWS?
I’m a little stuck on what my next steps should be. I’ve created the SQL scripts to copy the data from CSV files located in S3, populated the person table and added all the vocabularies in Redshift’s query editor, and verified that all the data exists in Redshift. Now, my problem is that I am having issues with using the configuration panel in Atlas 2.75 to add a new dataset to the web API. Every time I submit the dataset (using the schema name that I created in Redshift), Atlas always responds with an error that says:
The Source was not saved. An exception ocurred: javax.persistence.EntityExistsException
More pieces of information that may be helpful:
- I have the internet gateway restricted to only allow traffic from inside our internal network’s CIDR range.
- I’m using the JDBC connection that Redshift gives me to attempt to load the dataset, along with the master user’s password.
- I do not have a results schema at this point (this is all synthetic data currently and I’m trying to just get familiar with loading data into the system).
Am I missing something glaringly obvious? To me it seems like Atlas can’t find the dataset, but I know it exists in Redshift.