我正在尝试将Hibernate Search添加到我的项目中以提高搜索性能,但我在索引庞大的表时遇到问题。 我添加了Hibernate Search依赖项,我有一个简单的servlet,我触发索引过程:
FullTextEntityManager ftem = Search.getFullTextEntityManager(em);
try {
ftem
.createIndexer(MyEntity.class)
.batchSizeToLoadObjects(25)
.cacheMode(CacheMode.NORMAL)
.threadsToLoadObjects(5)
.startAndWait();
} catch (InterruptedException e) {
e.printStackTrace();
}
并在我的persistance.xml中:
<property name="hibernate.show_sql" value="false" />
<property name="hibernate.dialect" value="org.hibernate.dialect.MySQL5InnoDBDialect" />
<property name="hibernate.archive.autodetection" value="class" />
<property name="hibernate.search.default.directory_provider" value="filesystem" />
<property name="hibernate.search.default.indexBase" value="/var/lucene/indexes" />
问题是MyEntity表有大约25万行,大约30秒后出现内存不足错误消息:
2015-07-28 21:16:50,168 INFO [stdout] (default task-60) Building index
2015-07-28 21:16:55,180 INFO [org.hibernate.search.impl.SimpleIndexingProgressMonitor] (Hibernate Search: identifierloader-1) HSEARCH000027: Going to reindex 22593085 entities
2015-07-28 21:19:47,186 ERROR [org.jboss.as.controller.management-operation] (DeploymentScanner-threads - 2) WFLYCTL0013: Operation ("read-children-resources") failed - address: ([]): java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-07-28 21:19:58,506 WARN [org.jboss.jca.core.connectionmanager.listener.TxConnectionListener] (Hibernate Search: identifierloader-1) IJ000305: Connection error occured: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@15a020a3[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@446189fe connection handles=1 lastReturned=1438110947536 lastValidated=1438108373971 lastCheckedOut=1438111010224 trackByTx=true pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@3fb3ab95 mcp=SemaphoreArrayListManagedConnectionPool@496e4f29[pool=MyProjectApiDS] xaResource=LocalXAResourceImpl@4f676ce7[connectionListener=15a020a3 connectionManager=798378ab warned=false currentXid=< formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8010b:537a5b28:55b7cad0:167, node_name=1, branch_uid=0:ffffc0a8010b:537a5b28:55b7cad0:169, subordinatenodename=null, eis_name=java:/MyProjectApiDS > productName=MySQL productVersion=5.6.25-log jndiName=java:/MyProjectApiDS] txSync=null]: javax.resource.spi.ResourceAdapterInternalException: Unexpected error
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.broadcastConnectionError(BaseWrapperManagedConnection.java:699)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.connectionError(BaseWrapperManagedConnection.java:665)
at org.jboss.jca.adapters.jdbc.WrappedConnection.checkException(WrappedConnection.java:1669)
at org.jboss.jca.adapters.jdbc.WrappedStatement.checkException(WrappedStatement.java:1267)
at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:467)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:82)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2066)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1863)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1839)
at org.hibernate.loader.Loader.scroll(Loader.java:2627)
at org.hibernate.loader.criteria.CriteriaLoader.scroll(CriteriaLoader.java:121)
at org.hibernate.internal.StatelessSessionImpl.scroll(StatelessSessionImpl.java:682)
at org.hibernate.internal.CriteriaImpl.scroll(CriteriaImpl.java:394)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.loadAllIdentifiers(IdentifierProducer.java:146)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.inTransactionWrapper(IdentifierProducer.java:111)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.run(IdentifierProducer.java:95)
at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.runWithErrorHandler(OptionallyWrapInJTATransaction.java:97)
at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-07-28 21:19:58,514 ERROR [org.jboss.remoting.remote.connection] (XNIO-1 I/O-1) JBREM000200: Remote connection failed: java.io.IOException: Istniejące połączenie zostało gwałtownie zamknięte przez zdalnego hosta
2015-07-28 21:19:58,531 INFO [org.jboss.as.server.deployment.scanner] (DeploymentScanner-threads - 2) WFLYDS0019: Deployment mysql-connector-java-5.1.34-bin.jar was previously deployed by this scanner but has been removed from the server deployment list by another management tool. Marker file C:\servers\wildfly-9.0.0.Final\standalone\deployments\mysql-connector-java-5.1.34-bin.jar.undeployed is being added to record this fact.
2015-07-28 21:19:58,620 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Hibernate Search: identifierloader-1) SQL Error: 0, SQLState: null
2015-07-28 21:19:58,621 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Hibernate Search: identifierloader-1) Error
2015-07-28 21:19:58,622 ERROR [org.hibernate.search.exception.impl.LogErrorHandler] (Hibernate Search: identifierloader-1) HSEARCH000058: HSEARCH000116: Unexpected error during MassIndexer operation: org.hibernate.exception.GenericJDBCException: could not extract ResultSet
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:112)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:91)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2066)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1863)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1839)
at org.hibernate.loader.Loader.scroll(Loader.java:2627)
at org.hibernate.loader.criteria.CriteriaLoader.scroll(CriteriaLoader.java:121)
at org.hibernate.internal.StatelessSessionImpl.scroll(StatelessSessionImpl.java:682)
at org.hibernate.internal.CriteriaImpl.scroll(CriteriaImpl.java:394)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.loadAllIdentifiers(IdentifierProducer.java:146)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.inTransactionWrapper(IdentifierProducer.java:111)
at org.hibernate.search.batchindexing.impl.IdentifierProducer.run(IdentifierProducer.java:95)
at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.runWithErrorHandler(OptionallyWrapInJTATransaction.java:97)
at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: Error
at org.jboss.jca.adapters.jdbc.WrappedConnection.checkException(WrappedConnection.java:1677)
at org.jboss.jca.adapters.jdbc.WrappedStatement.checkException(WrappedStatement.java:1267)
at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:467)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:82)
... 15 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
2015-07-28 21:19:58,667 INFO [org.hibernate.search.impl.SimpleIndexingProgressMonitor] (default task-60) HSEARCH000028: Reindexed 22593085 entities
2015-07-28 21:19:58,673 WARN [com.arjuna.ats.jta] (Hibernate Search: identifierloader-1) ARJUNA016031: XAOnePhaseResource.rollback for < formatId=131077, gtrid_length=29, bqual_length=36, tx_uid=0:ffffc0a8010b:537a5b28:55b7cad0:167, node_name=1, branch_uid=0:ffffc0a8010b:537a5b28:55b7cad0:169, subordinatenodename=null, eis_name=java:/MyProjectApiDS > failed with exception: org.jboss.jca.core.spi.transaction.local.LocalXAException: IJ001160: Could not rollback local transaction
at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.rollback(LocalXAResourceImpl.java:253)
at com.arjuna.ats.internal.jta.resources.arjunacore.XAOnePhaseResource.rollback(XAOnePhaseResource.java:205)
at com.arjuna.ats.internal.arjuna.abstractrecords.LastResourceRecord.topLevelAbort(LastResourceRecord.java:126)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2993)
at com.arjuna.ats.arjuna.coordinator.BasicAction.doAbort(BasicAction.java:2972)
at com.arjuna.ats.arjuna.coordinator.BasicAction.Abort(BasicAction.java:1675)
at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.cancel(TwoPhaseCoordinator.java:127)
at com.arjuna.ats.arjuna.AtomicAction.abort(AtomicAction.java:186)
at com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionImple.rollbackAndDisassociate(TransactionImple.java:1282)
at com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction.rollback(BaseTransaction.java:143)
at com.arjuna.ats.jbossatx.BaseTransactionManagerDelegate.rollback(BaseTransactionManagerDelegate.java:114)
at org.hibernate.search.batchindexing.impl.OptionallyWrapInJTATransaction.cleanUpOnError(OptionallyWrapInJTATransaction.java:123)
at org.hibernate.search.batchindexing.impl.ErrorHandledRunnable.run(ErrorHandledRunnable.java:54)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.jboss.jca.core.spi.transaction.local.LocalResourceException: No operations allowed after connection closed.
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.rollback(LocalManagedConnection.java:139)
at org.jboss.jca.core.tx.jbossts.LocalXAResourceImpl.rollback(LocalXAResourceImpl.java:248)
... 15 more
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after connection closed.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
at com.mysql.jdbc.Util.getInstance(Util.java:360)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:935)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:870)
at com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1232)
at com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1225)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:4568)
at org.jboss.jca.adapters.jdbc.local.LocalManagedConnection.rollback(LocalManagedConnection.java:132)
... 16 more
所以问题是,如何自动索引大表?
答案 0 :(得分:2)
您使用的是哪个版本的Hibernate Search。如果您使用的是最新的5.4版本,则实际上可以仅为索引配置事务超时。像这样:
fullTextSession
.createIndexer( User.class )
.batchSizeToLoadObjects( 25 )
.cacheMode( CacheMode.NORMAL )
.threadsToLoadObjects( 12 )
.idFetchSize( 150 )
.transactionTimeout( 1800 )
.startAndWait();
如果可以,我建议使用最新版本。
答案 1 :(得分:2)
MySQL的JDBC驱动程序开发人员做出了一些令人讨厌的决定;您需要强制它不要尝试在内存中加载所有数据库,但实际上使用延迟分页,因为Hibernate要求它做,通过将JDBC提取大小设置为Integer.MIN_VALUE
。
ftem
.createIndexer(MyEntity.class)
.batchSizeToLoadObjects(25)
.cacheMode(CacheMode.NORMAL)
.idFetchSize(Integer.MIN_VALUE) // Important on MySQL!
.transactionTimeout(timeout) //also useful
.threadsToLoadObjects(5)
.startAndWait();
答案 2 :(得分:1)
根据您提供的堆栈跟踪,我猜您正在使用的transaction
问题正在超时。增加配置中的超时设置,然后重试,但这是不推荐,因为增加默认超时将适用于整个应用程序中使用的transaction
。
如果增加超时帮助你,那么你应该尝试其他一些方法,如ScrollMode
或批处理。
考虑this post,我希望这会有所帮助。
答案 3 :(得分:0)
我一直遇到startAndWait()的问题。最新版本很好,但仍有一些问题。对于非常大的数据集,有时将事务超时设置为高是不可行的。
您可以尝试的一种方法是批量获取结果。它没有相同的完整性,因此您可能会错过新记录,但它不会超时。
代码看起来像这样:
int offset = 0;
int batchSize = 500;
boolean indexComplete = false;
while (!indexComplete) {
utx.begin();
FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em);
TypedQuery<User> query = fullTextEntityManager.createQuery("SELECT u FROM User u", User.class);
query.setFirstResult(offset);
query.setMaxResults(batchSize);
LOGGER.info("Indexing {} users from offset {}", batchSize, offset);
List<User> results = query.getResultList();
if (results == null || results.isEmpty()) {
indexComplete = true;
} else {
offset += results.size();
for (User user : results) {
fullTextEntityManager.index(user);
}
}
utx.commit();
}
LOGGER.info("Indexed {} objects", offset);