如何提高将数据插入数据库的性能?

时间:2016-08-18 20:57:32

标签: performance postgresql hibernate jpa ejb

我使用PostgreSQL 9.5(和最新的JDBC驱动程序 - 9.4.1209),JPA 2.1(Hibernate),EJB 3.2,CDI,JSF 2.2和Wildfly 10.我将大量数据插入到数据库中(约1百万-170万实体)。实体数量取决于用户将添加到页面上的表单的文件。

有什么问题?

问题是将数据插入数据库的执行时间非常慢。每次调用flush()方法时执行时间都在增加。我已经使用println(...)方法知道flush方法的执行速度有多快。对于前~4次(400000个实体),我每隔~20s收到println(...)方法的结果。后来,flush方法的执行时间非常慢,而且还在不断增长。

当然,如果我删除了flush()clear()方法,我每隔1s收到println(...)方法的结果但是当我接近300万个实体时,我也收到了例外:

  

java.lang.OutOfMemoryError:超出GC开销限制

到目前为止我做了什么?

  • 我也尝试过容器管理事务和Bean管理事务(请查看下面的代码)。
  • 我没有将auto_increment功能用于PK ID。我在bean代码中手动添加ID。
  • 我还尝试更改要刷新的实体数量(目前为100000)。
  • 我试图在hibernate.jdbc.batch_size属性中设置相同数量的实体。它没有帮助,执行时间要慢得多。
  • 我试图尝试persistence.xml文件中的属性。例如,我添加了reWriteBatchedInserts属性,但实际上我不知道它是否有帮助。
  • PostgreSQL在SSD上运行,但数据存储在HDD上,因为未来数据可能很大。但是我试图将我的PostgreSQL数据移动到SSD,结果是一样的,没有任何改变。

问题是:如何提高将数据插入数据库的性能?

这是我桌子的结构:

  column_name  |   udt_name  | length | is_nullable |  key
---------------+-------------+--------+-------------+--------
id             |    int8     |        |     NO      |   PK
id_user_table  |    int4     |        |     NO      |   FK
starttime      | timestamptz |        |     NO      |
time           |   float8    |        |     NO      |
sip            |   varchar   |  100   |     NO      |
dip            |   varchar   |  100   |     NO      |
sport          |    int4     |        |     YES     |
dport          |    int4     |        |     YES     |
proto          |   varchar   |   50   |     NO      |
totbytes       |    int8     |        |     YES     |
info           |    text     |        |     YES     |
label          |   varchar   |   10   |     NO      |

这是我将EJB数据插入数据库的EJB bean(第一版)的一部分:

@Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {

    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param list - data from the file.
     * @param idFK - foreign key.
     */
    public void send(List<String> list, int idFK) {

        if(handleCSV(list,idFK)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the database.
     */
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    private boolean handleCSV(List<String> list, int idFK){

        try{

            long start=0;
            Pattern patternRow=Pattern.compile(",");

            for (String s : list) {

                if(start!=0){

                    String[] data=patternRow.split(s);                  

                    //Preparing data...

                    DataStoreAll dataStore=new DataStoreAll();
                    DataStoreAllId dataId=new DataStoreAllId(start++, idFK);                    
                    dataStore.setId(dataId);

                    //Setting the other object fields...                                                    

                    entityManager.persist(dataStore);               

                    if(start%100000==0){
                        System.out.println("Number of entities: "+start);
                        entityManager.flush();
                        entityManager.clear();                      
                    }
                }
                else start++;
            }                       

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        }

        return true;
    }

    @Inject
    private EntityManager entityManager;
}

我没有使用容器管理的交易,而是尝试使用Bean管理的交易(第二版):

@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param list - data from the file.
     * @param idFK - foreign key.
     */
    public void send(List<String> list, int idFK) {

        if(handleCSV(list,idFK)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the linkedList collection.
     */
    private boolean handleCSV(List<String> list, int idFK){

        try{

            long start=0;
            Pattern patternRow=Pattern.compile(",");
            List<DataStoreAll> entitiesAll=new LinkedList<>();

            for (String s : list) {

                if(start!=0){

                    String[] data=patternRow.split(s);                  

                    //Preparing data...

                    DataStoreAll dataStore=new DataStoreAll();
                    DataStoreAllId dataId=new DataStoreAllId(start++, idFK);                    
                    dataStore.setId(dataId);

                    //Setting the other object fields...                                                    

                    entitiesAll.add(dataStore);

                    if(start%100000==0){

                        System.out.println("Number of entities: "+start);
                        saveDataStoreAll(entitiesAll);                      
                    }
                }
                else start++;
            }

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        }

        return true;
    }

    /**
     * The method commits the transaction.
     */
    private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {

        Iterator<DataStoreAll> iter=entities.iterator();

        ut.begin();     

        while(iter.hasNext()){

            entityManager.persist(iter.next());
            iter.remove();
            entityManager.flush();
            entityManager.clear();
        }

        ut.commit();
    }

    @Inject
    private EntityManager entityManager;

    @Inject
    private UserTransaction ut;
}

这是我的persistence.xml

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
   xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="
        http://xmlns.jcp.org/xml/ns/persistence
        http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
   <persistence-unit name="primary">
      <jta-data-source>java:/PostgresDS</jta-data-source>
      <properties>
         <property name="hibernate.show_sql" value="false" />
         <property name="hibernate.jdbc.batch_size" value="50" />         
         <property name="hibernate.order_inserts" value="true" />
         <property name="hibernate.order_updates" value="true" />
         <property name="hibernate.jdbc.batch_versioned_data" value="true"/>
         <property name="reWriteBatchedInserts" value="true"/>         
      </properties>
   </persistence-unit>
</persistence>

如果我忘了添加内容,请告诉我相关信息,我会更新帖子。

更新

这是调用DataDaoImpl#send(...)的控制器:

@Named
@ViewScoped
public class DataController implements Serializable {

    @PostConstruct
    private void init(){

        //...
    }

    /**
     * Handle of the uploaded file.
     */
    public void handleFileUpload(FileUploadEvent event){

        uploadFile=event.getFile();

        try(InputStream input = uploadFile.getInputstream()){

            Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");

            if(!folder.toFile().exists()){
                if(!folder.toFile().mkdirs()){
                    folder=Paths.get(System.getProperty("jboss.server.data.dir"));
                }
            }

            String filename = FilenameUtils.getBaseName(uploadFile.getFileName()); 
            String extension = FilenameUtils.getExtension(uploadFile.getFileName());
            filePath = Files.createTempFile(folder, filename + "-", "." + extension);

            //Save the file on the server.
            Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);

            //Add reference to the unconfirmed uploaded files list.
            userFileManager.addUnconfirmedUploadedFile(filePath.toFile());

            FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));

        } catch (IOException e) {

            //...
        }
    }

    /**
     * Sending data from file to the database.
     */
    public void send(){

        //int idFK=...

        //The model includes the data from the file and other things which I transfer to the EJB bean.
        AddDataModel addDataModel=new AddDataModel();       
        //Setting the addDataModel fields...        

        try{

            if(uploadFile!=null){

                //Each row of the file == 1 entity.
                List<String> list=new ArrayList<String>();

                Stream<String> stream=Files.lines(filePath);
                list=stream.collect(Collectors.toList());

                addDataModel.setList(list);
            }

        } catch (IOException e) {

            //...
        }   

        //Sending data to the DataDaoImpl EJB bean.
        if(dataDao.send(addDataModel,idFK)){

            userFileManager.confirmUploadedFile(filePath.toFile());

            FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
        }       
    }

    private static final long serialVersionUID = -7202741739427929050L;

    @Inject
    private DataDao dataDao;

    private UserFileManager userFileManager;
    private UploadedFile uploadFile;
    private Path filePath;
}

更新2

这里是更新的EJB bean,我将数据插入数据库:

@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {

    /**
     * This's the first method which is executed. 
     * The CDI bean (controller) calls this method.
     * @param addDataModel - object which includes path to the uploaded file and other things which are needed.
     */ 
    public void send(AddDataModel addDataModel){

        if(handleCSV(addDataModel)){
            //...
        }
        else{
            //...
        }
    }

    /**
     * The method inserts data into the database.
     */
    private boolean handleCSV(AddDataModel addDataModel){

        PreparedStatement ps=null;
        Connection con=null;

        FileInputStream fileInputStream=null;
        Scanner scanner=null;       

        try{

            con=ds.getConnection();
            con.setAutoCommit(false);

            ps=con.prepareStatement("insert into data_store_all "
                    + "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
                    + "values(?,?,?,?,?,?,?,?,?,?,?)");

            long start=0;       

            fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
            scanner=new Scanner(fileInputStream, "UTF-8");

            Pattern patternRow=Pattern.compile(",");            
            Pattern patternPort=Pattern.compile("\\d+");

            while(scanner.hasNextLine()) {

                if(start!=0){

                    //Loading a row from the file into table.

                    String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));

                    //Preparing datetime.

                    SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");                                                          
                    GregorianCalendar calendar=new GregorianCalendar();
                    calendar.setTime(simpleDateFormat.parse(data[1]));
                    calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\\.").split(data[1])[1])/1000);

                    //Preparing an entity

                    ps.setLong(1, start++); //id PK
                    ps.setInt(2, addDataModel.getIdFk()); //id FK
                    ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
                    ps.setDouble(4, Double.parseDouble(data[2])); //time
                    ps.setString(5, data[3]); //sip
                    ps.setString(6, data[4]); //dip

                    if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
                    else ps.setNull(7, java.sql.Types.INTEGER);

                    if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
                    else ps.setNull(8, java.sql.Types.INTEGER);

                    ps.setString(9, data[7]); //proto

                    if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
                    else ps.setObject(10, null);

                    if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
                    else ps.setString(11, null);

                    ps.addBatch();

                    if(start%100000==0){
                        System.out.println("Number of entity: "+start);

                        ps.executeBatch();
                        ps.clearParameters();
                        ps.clearBatch();
                        con.commit();                       
                    }
                }
                else{
                    start++;
                    scanner.nextLine();
                }
            }

            if (scanner.ioException() != null) throw scanner.ioException();

        } catch(Throwable t){

            CustomExceptionHandler exception=new CustomExceptionHandler(t);
            return exception.persist("DDI", "handleCSV");
        } finally{

            if (fileInputStream!=null)
                try {
                    fileInputStream.close();
                } catch (Throwable t2) {
                    CustomExceptionHandler exception=new CustomExceptionHandler(t2);
                    return exception.persist("DDI", "handleCSV.Finally");
                }
            if (scanner != null) scanner.close();
        }

        return true;
    }

    @Inject
    private EntityManager entityManager;

    @Resource(mappedName="java:/PostgresDS") 
    private DataSource ds;
}

2 个答案:

答案 0 :(得分:2)

您的问题不一定是数据库甚至是休眠,而是您一次将太多数据加载到内存中。这就是为什么你得到内存不足的消息以及为什么你看到jvm在那里挣扎的原因。

您从流中读取文件,但在创建字符串列表时将其全部推送到内存中。然后将该字符串列表映射到某种实体的链接列表中!

相反,使用流以小块处理文件并将块插入数据库。基于扫描仪的方法看起来像这样:

child

您可能会发现在进行此更改后,hibernate / ejb的功能已经足够好了。但我认为你会发现普通的jdbc要快得多。他们说你可以期待3倍到4倍的减速带,具体取决于。那会对很多数据产生很大的影响。

如果您正在谈论真正庞大的数据,那么您应该查看CopyManager,它允许您直接将数据流加载到数据库中。您可以使用流式api转换数据。

答案 1 :(得分:1)

当您使用WildFly 10时,您处于Java EE 7环境中。

因此,您应该考虑使用JSR-352批处理来执行文件导入。

查看An Overview of Batch Processing in Java EE 7.0

这应解决您的所有内存消耗和交易问题。