通过Apache POI读取并通过Hibernate和JPA持久化,加速从excel文件导入数据库

时间:2013-04-16 14:42:06

标签: hibernate jpa ejb-3.0 apache-poi jboss7.x

我正在尝试通过Apache POI读取来加速从excel文件导入数据库,并通过JBOSS 7.1中的Hibernate和JPA(这是一个特定的要求,使用JYA数据源)进行持久化。然而目前进口速度太慢 - 对于30,000条记录大约需要3分钟,我需要将其减少到大约30秒。我正在寻找帮助来设置批量插入,我在presend工作中没有尝试过。

我的persistence.xml如下:

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.0"
   xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="
        http://java.sun.com/xml/ns/persistence
        http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">
   <persistence-unit name="primary" transaction-type="JTA">
      <jta-data-source>java:jboss/datasources/MySqlDS</jta-data-source>
      <properties>      
         <!-- Properties for Hibernate -->
         <property name="hibernate.hbm2ddl.auto" value="update" />
             <property name="hibernate.default_catalog" value="myDatabase"/>
            <property name="hibernate.dialect" value="org.hibernate.dialect.MySQLDialect" />
            <property name="hibernate.show_sql" value="false" />
            <property name="hibernate.format_sql" value="false" />      
             <property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect"/>
             <property name="hibernate.order_updates" value="true"/>
             <property name="hibernate.order_inserts" value="true"/>
             <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
             <property name="hibernate.jdbc.fetch_size" value="500"/>
             <property name="hibernate.jdbc.batch_size" value="500"/>
             <property name="hibernate.default_batch_fetch_size" value="16"/>
             <property name="hibernate.connection.release_mode" value="auto"/>
             <property name="hibernate.cache.region.jbc2.cachefactory" value="java:CacheManager"/>
             <property name="hibernate.cache.use_second_level_cache" value="true"/>
             <property name="hibernate.cache.use_query_cache" value="false"/>
             <property name="hibernate.cache.use_minimal_puts" value="true"/>
             <property name="hibernate.cache.region.jbc2.cfg.entity" value="mvcc-entity"/>
             <property name="hibernate.cache.region_prefix" value="services"/>
             <property name="hibernate.connection.driver_class" value="com.mysql"/>
             <property name="hibernate.connection.url" value="jdbc:mysql://localhost:3306/myDatabase"/>
             <property name="hibernate.connection.username" value="root"/>
      </properties>
   </persistence-unit>
</persistence>

我有一个EJB Timer类,它在JBOSS启动时部署,它会查找新的excel文件,如果找到它们将它们导入数据库 - 这一切都运行正常 - 它只是慢...... //听众类                         excelReader.loadDatabase(child.getPath());

// This all works ok            
                }

            }
        }

    }

* 这是通过JPA *

实际保存文件的类
@Stateless
@LocalBean
public class ExcelReader implements TableDao {

@PersistenceContext
private EntityManager em;

private HSSFRow row = null;
private HSSFWorkbook wb;
private BaseDataTable baseDataTable;

public void loadDatabase(String path) 
{
    try 
    {
        FileInputStream latestExcelFile = new FileInputStream(path);
        wb = new HSSFWorkbook(latestExcelFile);
    } catch (Exception ex) {}   

    importTheTable();

}

public ExcelReader() {}

public void importTheTable(){

    HSSFSheet baseDataTableSheet = wb.getSheetAt(0);

    for (int i = 1; i <= baseDataTableSheet.getLastRowNum(); i++) 
    {
        row = baseDataTableSheet.getRow(i);
        baseDataTable = new BaseDataTable();            
        try 
        {               
            baseDataTable.setDateTime(row.getCell(0).getDateCellValue());
            baseDataTable.setEventId((int) row.getCell(1).getNumericCellValue());
            baseDataTable.setCauseClass(parseCauseClass(row.getCell(2).toString()));
            baseDataTable.setUeType((int) row.getCell(3).getNumericCellValue());
            baseDataTable.setMarket((int) row.getCell(4).getNumericCellValue());
            baseDataTable.setOperator((int) row.getCell(5).getNumericCellValue());
            baseDataTable.setCellId((int) row.getCell(6).getNumericCellValue());
            baseDataTable.setDuration((int) row.getCell(7).getNumericCellValue());
            baseDataTable.setCauseCode((int) row.getCell(8).getNumericCellValue());
            baseDataTable.setNeVersion(row.getCell(9).toString());
            baseDataTable.setImsi(row.getCell(10).getNumericCellValue());
            baseDataTable.setHier3Id((row.getCell(11).toString()));
            baseDataTable.setHier32Id((row.getCell(12).toString()));
            baseDataTable.setHier321Id((row.getCell(13).toString()));


            addBaseTableEntry(baseDataTable);

        } catch (Exception ex) { System.out.println("Error in excel file"); }


        if(i%1000 == 0)
        {
            em.flush();
            em.clear();
        }
    }

}

**这就是EntityManager的创建方式**

@Stateful
@RequestScoped
public class Resources {

    @PersistenceContext(type = PersistenceContextType.EXTENDED)
    private EntityManager em;

    @Produces
    public EntityManager getEm() {
        return em;
    }
}

这一切都运行正常但速度太慢 - 我在网上无休止地搜索并应用UserTransaction尝试加速导入但无济于事,任何正确方向的帮助将非常感激,

干杯

1 个答案:

答案 0 :(得分:1)

我没有看到任何与事务相关的注释,看起来每个插入(addBaseTableEntry方法对吗?)都在它自己的事务中(这将非常慢)。

尝试添加

@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)

到你的loadDatabase方法。

编辑: 将id生成策略更改为GenerationType.SEQUENCE或TABLE(适合您)。 IDENTITY生成策略的每个插入返回新生成的id的原因是ID,这使得批量插入不可能。 有关详细信息,请参阅http://docs.jboss.org/hibernate/core/3.6/reference/en-US/html/batch.html