OHDSI Home | Forums | Wiki | Github

Migrate from GenerationStrategy.AUTO to GenerationStrategy.TABLE for identity columns

@alfranke, et al:
We’re faced with moving our infrastructure into a parallel data warehouse tech that doesn’t support identity columns. I’ve looked at it seems that we can move to a @GeneratedValue(strategy = GenerationnType.TABLE) on our @Id fields, and add databaes tables to store the sequence key-value pairs.

One question I have is how Spring Batch defines how the IDs are generated in the job repository. From the DDL script, oracle uses sequeneces, MS SQL ddl uses a table with IDENTITY columns, but neither sequencers nor identity columns available in the platform (this is Microsoft PDW: it supports a subset of MS SQL 2012 functions).

Does anyone know how to direct the spring batch repository persistence to use the GenerationType.TABLE for identity columns? Is there an xml.config file that we can provide that will set the persistence directives to use an table-generated sequencers?

-Chris

So, does this mean OHDSI (SqlRender, etc.) will officially support Microsoft PDW? Seems like there may be other issues given that this is a subset of sql server.

See the following note regarding the work-around for mysql (no sequence support) (B.1.3 Identity). My guess is there is no super simple configuration option to change sqlserver support. However, you may be able to wire spring batch with a PDW-specific implementation, similar to that of MySql.
http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html

To wire WebAPI, looks like you could create an extension of DefaultDataFieldMaxValueIncrementerFactory and override getIncrementer to supply another implementation for sql server (not sure what would be involved in adding a PDW database type along side). Then in our JobConfig class inject this impl into the JobRepositoryFactoryBean.

        protected JobRepository createJobRepository() throws Exception {
            final JobRepositoryFactoryBean factory = new JobRepositoryFactoryBean();
            factory.setDataSource(this.dataSource);
            //something like this
            factory.setIncrementerFactory(new WebApiIncrementerFactory());
            //lines omitted
            factory.afterPropertiesSet();
            return factory.getObject();
        }

Thanks for the info Alex, that looks like that’s exactly what we need to do.

To answer the question: Does OHDSI support PDW? Yes, Sql Render has a dialect option for PDW.

So the next step is to make a branch that will migrate the existing tables to not use sequencers, unless you also know of a way where I can specify a .xml file that would override the annotations that we set in code for the persistence options. I seem to recall reading about spring that if the container identifies a file with settings for a bean, it will defer to the values in the config file over what is found in the annotations. If this is true, getting this set up on PDW could be just a matter of providing a file that can be specified in a property field. This way, oracle/sql/postgres can work as-is, and on the pdw side we can point the application settings to a config file that will override the necessary hibernate/jpa settings.

Are you familiar with specifying files for configuration in spring? I’m not sure how to implement it, if such a thing can be done.

-Chris

Yeah, it would be nice not to have to go “no sequences” across the board. I think you are talking about *.hbm.xml Hibernate config files and not spring config. I have not used this much be I have read that hibernate should favor xml over annotations.

Hi, Alex,
I’m still pretty new to using Spring, especially the Spring Batch, and somewhat Spring Hibernate, so forgive me but I’ve got a lot of questions. But for starters, I think these two are the first challenges:

  1. What would the *.xml files that I’d need to define in order to specify the persistence attributes for our JPA entities? What schema do these xml files follow?
  2. How do you direct the spring application to load and use those .xml configs? Startup paramaters in the app?

I’m getting a blank PDW instance set up and I’m going to migrate our existing tables into the PDW syntax (minus the IDENTITY and SEQUENCE DDL), and from that I’m hoping to get a better understanding about what the framework is going to try to do with the configured datasource.

My first suspicion is that the JDBC driver for MSSql 2014 is the same driver that we can use for PDW (I believe) and if so, I’m affraid the PDW will be incorrectly identified as MSSQL and it will start treating it like a MSSQL 2014 database. But I won’t know until I try so I’m going to first get this set up to point to our internal host.

Hibernate is not part of spring. Hibernate is the ORM framework. There is JPA and Spring Data on top of that. The *.hbm.xml files are specific to Hibernate and I have not used them before. However, it looks like you can specify those in persistentUnitPostProcessors config of the entityManagerFactory (WebAPIs DataAccessConfig.java).

  <property name="persistenceUnitPostProcessors">
    <list>
      <bean class="org.springframework.data.jpa.support.ClasspathScanningPersistenceUnitPostProcessor">
        <constructor-arg value="com.acme.domain" />
        <property name="mappingFileNamePattern" value="**/*hbm.xml" />
      </bean>
    </list>
    @Bean
    public EntityManagerFactory entityManagerFactory() {
        
        HibernateJpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
        vendorAdapter.setGenerateDdl(false);
        vendorAdapter.setShowSql(Boolean.valueOf(this.env.getRequiredProperty("spring.jpa.show-sql")));
        //hibernate.dialect is resolved based on driver
        //vendorAdapter.setDatabasePlatform(hibernateDialect);
        
        LocalContainerEntityManagerFactoryBean factory = new LocalContainerEntityManagerFactoryBean();
        factory.setJpaVendorAdapter(vendorAdapter);
        factory.setPackagesToScan("org.ohdsi.webapi");
        factory.setDataSource(primaryDataSource());
        //something like this...
        factory.setPersistenceUnitPostProcessors(...);
        
        factory.afterPropertiesSet();   
        return factory.getObject();
    }

http://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch01.html#tutorial-firstapp-mapping

I’m not sure if the mapping files will do anything for us that annotations won’t.
I also think your suspicion is correct in that PDW may not be supported at various levels (Jdbc driver, Spring Jdbc, Hibernate, etc.), which is what I meant when I assumed there would be “other issues”.
Looks like if it is necessary to support PDW, then the change to not use sequences across the board may be something to pursue (and then wire up spring batch with custom incrementers).

Thanks for looking into this, Alex. You’re right there are other issues I’ve encountered (example, you can’t have an insert statement with getDate() in it, which is something Flyway does when logging a schema migration). So the plan is to look into having a logical separation between the cdm datasource and our ohdsi results. We could possibly support the option where our cdm datasource is in a highly optimized for read structure (hadoop? Polybase? Some other non relational form) where we do the analysis and then the the results datasource is used to persist analytic results, or define research studies, or other basic research activities that require application support tables. Frank started the discussion over here:

I don’t have any plans to change anything in the WebAPI; I’m going to try to work on a way of integrating with it from an app specifically geared towards dealing with APS envornments. But I’m hoping some of the lessons learned there will drive some enhancements in the OHDSI stack.

-Chris

t