我使用PostgreSQL 9.5(和最新的JDBC驱动程序 - 9.4.1209),JPA 2.1(Hibernate),EJB 3.2,CDI,JSF 2.2和Wildfly 10.我将大量数据插入到数据库中(约1百万-170万实体)。实体数量取决于用户将添加到页面上的表单的文件。
问题是将数据插入数据库的执行时间非常慢。每次调用flush()
方法时执行时间都在增加。我已经使用println(...)
方法知道flush
方法的执行速度有多快。对于前~4次(400000个实体),我每隔~20s收到println(...)
方法的结果。后来,flush
方法的执行时间非常慢,而且还在不断增长。
当然,如果我删除了flush()
和clear()
方法,我每隔1s收到println(...)
方法的结果但是当我接近300万个实体时,我也收到了例外:
java.lang.OutOfMemoryError:超出GC开销限制
auto_increment
功能用于PK ID。我在bean代码中手动添加ID。hibernate.jdbc.batch_size
属性中设置相同数量的实体。它没有帮助,执行时间要慢得多。persistence.xml
文件中的属性。例如,我添加了reWriteBatchedInserts
属性,但实际上我不知道它是否有帮助。问题是:如何提高将数据插入数据库的性能?
这是我桌子的结构:
column_name | udt_name | length | is_nullable | key
---------------+-------------+--------+-------------+--------
id | int8 | | NO | PK
id_user_table | int4 | | NO | FK
starttime | timestamptz | | NO |
time | float8 | | NO |
sip | varchar | 100 | NO |
dip | varchar | 100 | NO |
sport | int4 | | YES |
dport | int4 | | YES |
proto | varchar | 50 | NO |
totbytes | int8 | | YES |
info | text | | YES |
label | varchar | 10 | NO |
这是我将EJB数据插入数据库的EJB bean(第一版)的一部分:
@Stateless
public class DataDaoImpl extends GenericDaoImpl<Data> implements DataDao {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
@TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entityManager.persist(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
entityManager.flush();
entityManager.clear();
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
@Inject
private EntityManager entityManager;
}
我没有使用容器管理的交易,而是尝试使用Bean管理的交易(第二版):
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param list - data from the file.
* @param idFK - foreign key.
*/
public void send(List<String> list, int idFK) {
if(handleCSV(list,idFK)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the linkedList collection.
*/
private boolean handleCSV(List<String> list, int idFK){
try{
long start=0;
Pattern patternRow=Pattern.compile(",");
List<DataStoreAll> entitiesAll=new LinkedList<>();
for (String s : list) {
if(start!=0){
String[] data=patternRow.split(s);
//Preparing data...
DataStoreAll dataStore=new DataStoreAll();
DataStoreAllId dataId=new DataStoreAllId(start++, idFK);
dataStore.setId(dataId);
//Setting the other object fields...
entitiesAll.add(dataStore);
if(start%100000==0){
System.out.println("Number of entities: "+start);
saveDataStoreAll(entitiesAll);
}
}
else start++;
}
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
}
return true;
}
/**
* The method commits the transaction.
*/
private void saveDataStoreAll(List<DataStoreAll> entities) throws EntityExistsException,IllegalArgumentException,TransactionRequiredException,PersistenceException,Throwable {
Iterator<DataStoreAll> iter=entities.iterator();
ut.begin();
while(iter.hasNext()){
entityManager.persist(iter.next());
iter.remove();
entityManager.flush();
entityManager.clear();
}
ut.commit();
}
@Inject
private EntityManager entityManager;
@Inject
private UserTransaction ut;
}
这是我的persistence.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://xmlns.jcp.org/xml/ns/persistence
http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
<persistence-unit name="primary">
<jta-data-source>java:/PostgresDS</jta-data-source>
<properties>
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.jdbc.batch_versioned_data" value="true"/>
<property name="reWriteBatchedInserts" value="true"/>
</properties>
</persistence-unit>
</persistence>
如果我忘了添加内容,请告诉我相关信息,我会更新帖子。
这是调用DataDaoImpl#send(...)
的控制器:
@Named
@ViewScoped
public class DataController implements Serializable {
@PostConstruct
private void init(){
//...
}
/**
* Handle of the uploaded file.
*/
public void handleFileUpload(FileUploadEvent event){
uploadFile=event.getFile();
try(InputStream input = uploadFile.getInputstream()){
Path folder=Paths.get(System.getProperty("jboss.server.data.dir"),"upload");
if(!folder.toFile().exists()){
if(!folder.toFile().mkdirs()){
folder=Paths.get(System.getProperty("jboss.server.data.dir"));
}
}
String filename = FilenameUtils.getBaseName(uploadFile.getFileName());
String extension = FilenameUtils.getExtension(uploadFile.getFileName());
filePath = Files.createTempFile(folder, filename + "-", "." + extension);
//Save the file on the server.
Files.copy(input, filePath, StandardCopyOption.REPLACE_EXISTING);
//Add reference to the unconfirmed uploaded files list.
userFileManager.addUnconfirmedUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "Success", uploadFile.getFileName() + " was uploaded."));
} catch (IOException e) {
//...
}
}
/**
* Sending data from file to the database.
*/
public void send(){
//int idFK=...
//The model includes the data from the file and other things which I transfer to the EJB bean.
AddDataModel addDataModel=new AddDataModel();
//Setting the addDataModel fields...
try{
if(uploadFile!=null){
//Each row of the file == 1 entity.
List<String> list=new ArrayList<String>();
Stream<String> stream=Files.lines(filePath);
list=stream.collect(Collectors.toList());
addDataModel.setList(list);
}
} catch (IOException e) {
//...
}
//Sending data to the DataDaoImpl EJB bean.
if(dataDao.send(addDataModel,idFK)){
userFileManager.confirmUploadedFile(filePath.toFile());
FacesContext.getCurrentInstance().addMessage(null, new FacesMessage(FacesMessage.SEVERITY_INFO, "The data was saved in the database.", ""));
}
}
private static final long serialVersionUID = -7202741739427929050L;
@Inject
private DataDao dataDao;
private UserFileManager userFileManager;
private UploadedFile uploadFile;
private Path filePath;
}
这里是更新的EJB bean,我将数据插入数据库:
@Stateless
@TransactionManagement(TransactionManagementType.BEAN)
public class DataDaoImpl extends GenericDaoImpl<Data> {
/**
* This's the first method which is executed.
* The CDI bean (controller) calls this method.
* @param addDataModel - object which includes path to the uploaded file and other things which are needed.
*/
public void send(AddDataModel addDataModel){
if(handleCSV(addDataModel)){
//...
}
else{
//...
}
}
/**
* The method inserts data into the database.
*/
private boolean handleCSV(AddDataModel addDataModel){
PreparedStatement ps=null;
Connection con=null;
FileInputStream fileInputStream=null;
Scanner scanner=null;
try{
con=ds.getConnection();
con.setAutoCommit(false);
ps=con.prepareStatement("insert into data_store_all "
+ "(id,id_user_table,startTime,time,sIP,dIP,sPort,dPort,proto,totBytes,info) "
+ "values(?,?,?,?,?,?,?,?,?,?,?)");
long start=0;
fileInputStream=new FileInputStream(addDataModel.getPath().toFile());
scanner=new Scanner(fileInputStream, "UTF-8");
Pattern patternRow=Pattern.compile(",");
Pattern patternPort=Pattern.compile("\\d+");
while(scanner.hasNextLine()) {
if(start!=0){
//Loading a row from the file into table.
String[] data=patternRow.split(scanner.nextLine().replaceAll("[\"]",""));
//Preparing datetime.
SimpleDateFormat simpleDateFormat=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
GregorianCalendar calendar=new GregorianCalendar();
calendar.setTime(simpleDateFormat.parse(data[1]));
calendar.set(Calendar.MILLISECOND, Integer.parseInt(Pattern.compile("\\.").split(data[1])[1])/1000);
//Preparing an entity
ps.setLong(1, start++); //id PK
ps.setInt(2, addDataModel.getIdFk()); //id FK
ps.setTimestamp(3, new Timestamp(calendar.getTime().getTime())); //datetime
ps.setDouble(4, Double.parseDouble(data[2])); //time
ps.setString(5, data[3]); //sip
ps.setString(6, data[4]); //dip
if(!data[5].equals("") && patternPort.matcher(data[5]).matches()) ps.setInt(7, Integer.parseInt(data[5])); //sport
else ps.setNull(7, java.sql.Types.INTEGER);
if(!data[6].equals("") && patternPort.matcher(data[6]).matches()) ps.setInt(8, Integer.parseInt(data[6])); //dport
else ps.setNull(8, java.sql.Types.INTEGER);
ps.setString(9, data[7]); //proto
if(!data[8].trim().equals("")) ps.setLong(10, Long.parseLong(data[8])); //len
else ps.setObject(10, null);
if(data.length==10 && !data[9].trim().equals("")) ps.setString(11, data[9]); //info
else ps.setString(11, null);
ps.addBatch();
if(start%100000==0){
System.out.println("Number of entity: "+start);
ps.executeBatch();
ps.clearParameters();
ps.clearBatch();
con.commit();
}
}
else{
start++;
scanner.nextLine();
}
}
if (scanner.ioException() != null) throw scanner.ioException();
} catch(Throwable t){
CustomExceptionHandler exception=new CustomExceptionHandler(t);
return exception.persist("DDI", "handleCSV");
} finally{
if (fileInputStream!=null)
try {
fileInputStream.close();
} catch (Throwable t2) {
CustomExceptionHandler exception=new CustomExceptionHandler(t2);
return exception.persist("DDI", "handleCSV.Finally");
}
if (scanner != null) scanner.close();
}
return true;
}
@Inject
private EntityManager entityManager;
@Resource(mappedName="java:/PostgresDS")
private DataSource ds;
}
答案 0 :(得分:2)
您的问题不一定是数据库甚至是休眠,而是您一次将太多数据加载到内存中。这就是为什么你得到内存不足的消息以及为什么你看到jvm在那里挣扎的原因。
您从流中读取文件,但在创建字符串列表时将其全部推送到内存中。然后将该字符串列表映射到某种实体的链接列表中!
相反,使用流以小块处理文件并将块插入数据库。基于扫描仪的方法看起来像这样:
child
您可能会发现在进行此更改后,hibernate / ejb的功能已经足够好了。但我认为你会发现普通的jdbc要快得多。他们说你可以期待3倍到4倍的减速带,具体取决于。那会对很多数据产生很大的影响。
如果您正在谈论真正庞大的数据,那么您应该查看CopyManager,它允许您直接将数据流加载到数据库中。您可以使用流式api转换数据。
答案 1 :(得分:1)
当您使用WildFly 10时,您处于Java EE 7环境中。
因此,您应该考虑使用JSR-352批处理来执行文件导入。
查看An Overview of Batch Processing in Java EE 7.0。
这应解决您的所有内存消耗和交易问题。