最后几天,我正在使用MongoDB上的jgroups HS后端,infinispan目录提供程序(软索引文件存储)构建一个高效的Hibernate Search Cluster(大约3000万条记录)。在独立的本地wildfly中使用OGM massIndexer,几乎没有用于索引的配置。然而,现在我试图把它放在远程linux集群中,即使我使用了我在几个问题中看到的配置(比如Indexing huge table with Hibernate Search)。
但正如我在OGM MassIndexer中所看到的,我无法使用自定义配置:
2017-12-20 16:58:12,855 WARN [org.hibernate.ogm.massindex.impl.OgmMassIndexer] (default task-1) OGM000031: OgmMassIndexer doesn't support the configuration option 'threadsToLoadObjects'. Its setting will be ignored.
2017-12-20 16:58:12,854 WARN [org.hibernate.ogm.massindex.impl.OgmMassIndexer] (default task-1) OGM000031: OgmMassIndexer doesn't support the configuration option 'idFetchSize'. Its setting will be ignored.
2017-12-20 15:19:10,194 WARN [org.hibernate.ogm.massindex.impl.OgmMassIndexer] (default task-1) OGM000031: OgmMassIndexer doesn't support the configuration option 'threadsToLoadObjects'. Its setting will be ignored.
进行一些挖掘我发现THIS并且了解这些功能仅适用于NON OGM massIndexer,因此我无法配置属性以优化批量索引作业。
最后一次尝试我总是超出GC开销限制:
[Server:server-one] 17:18:26,987 ERROR [org.hibernate.search.exception.impl.LogErrorHandler](Hibernate OGM:BatchIndexingWorkspace-1)HSEARCH000058:HSEARCH000116:MassIndexer操作期间出现意外错误:org.apache .lucene.store.AlreadyClosedException:此IndexWriter已关闭 [服务器:server-one]在org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720) [服务器:server-one]在org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734) [服务器:server-one]在org.apache.lucene.index.IndexWriter.getAnalyzer(IndexWriter.java:1163) [服务器:server-one]在org.hibernate.search.backend.impl.lucene.IndexWriterDelegate。(IndexWriterDelegate.java:39) [服务器:server-one]在org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:217) [服务器:server-one] org.hibernate.search.backend.impl.lucene.LuceneBackendTaskStreamer.doWork(LuceneBackendTaskStreamer.java:44) [服务器:server-one]在org.hibernate.search.backend.impl.lucene.WorkspaceHolder.applyStreamWork(WorkspaceHolder.java:74) [服务器:server-one]在org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.performStreamOperation(DirectoryBasedIndexManager.java:103) [服务器:server-one]在org.hibernate.search.backend.impl.StreamingOperationExecutorSelector $ AddSelectionExecutor.performStreamOperation(StreamingOperationExecutorSelector.java:106) [服务器:server-one]在org.hibernate.search.backend.impl.batch.DefaultBatchBackend.sendWorkToShards(DefaultBatchBackend.java:73) [服务器:server-one]在org.hibernate.search.backend.impl.batch.DefaultBatchBackend.enqueueAsyncWork(DefaultBatchBackend.java:49) [服务器:server-one]在org.hibernate.ogm.massindex.impl.TupleIndexer.index(TupleIndexer.java:111) [服务器:server-one]在org.hibernate.ogm.massindex.impl.TupleIndexer.index(TupleIndexer.java:89) [服务器:server-one] org.hibernate.ogm.massindex.impl.TupleIndexer.runIndexing(TupleIndexer.java:202) [服务器:server-one]在org.hibernate.ogm.massindex.impl.TupleIndexer.run(TupleIndexer.java:192) [服务器:服务器一]在org.hibernate.ogm.massindex.impl.OptionallyWrapInJTATransaction.consumeInTransaction(OptionalWrapInJTATransaction.java:128) [服务器:server-one]在org.hibernate.ogm.massindex.impl.OptionallyWrapInJTATransaction.consume(OptionalWrapInJTATransaction.java:97) [服务器:服务器一]在org.hibernate.ogm.datastore.mongodb.MongoDBDialect.forEachTuple(MongoDBDialect.java:762) [服务器:server-one]在org.hibernate.ogm.dialect.impl.ForwardingGridDialect.forEachTuple(ForwardingGridDialect.java:168) [服务器:server-one]在org.hibernate.ogm.massindex.impl.BatchIndexingWorkspace.run(BatchIndexingWorkspace.java:77) java.util.concurrent.ThreadPoolExecutor.runWorker上的[Server:server-one](ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor上的[Server:server-one] $ Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run中的[Server:server-one](Thread.java:748) [服务器:server-one]引起:java.lang.OutOfMemoryError:超出GC开销限制 [服务器:服务器一]在org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.newTermState(Lucene50PostingsWriter.java:174) [服务器:服务器一]在org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.newTermState(Lucene50PostingsWriter.java:57) [服务器:server-one]在org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:166) [服务器:服务器一]在org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter $ TermsWriter.write(BlockTreeTermsWriter.java:1041) [服务器:server-one]在org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTreeTermsWriter.java:456) [服务器:服务器一]在org.apache.lucene.codecs.perfield.PerFieldPostingsFormat $ FieldsWriter.write(PerFieldPostingsFormat.java:198) [服务器:服务器一]在org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105) [服务器:服务器一]在org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:193) [服务器:server-one]在org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:95) [服务器:服务器一]在org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4086) [服务器:服务器一]在org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3666) [服务器:server-one]在org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588) [服务器:server-one]在org.apache.lucene.index.ConcurrentMergeScheduler $ MergeThread.run(ConcurrentMergeScheduler.java:626)
我如何调用MassIndexer:
@PersistenceContext(name = "ogm-persistence")
EntityManager em;
public void createIndex() throws InterruptedException {
FullTextEntityManager ftem = Search.getFullTextEntityManager(em);
ftem.createIndexer(EventEntity.class)
.batchSizeToLoadObjects(30)
.threadsToLoadObjects(4)
.cacheMode(CacheMode.NORMAL)
.startAndWait();
}
我的persistence.xml
<property name="hibernate.transaction.jta.platform" value="JBossAS" />
<property name="hibernate.ogm.datastore.provider" value="mongodb"/>
<property name="hibernate.ogm.datastore.database" value="*****"/>
<property name="hibernate.ogm.datastore.host" value="*******"/>
<property name="hibernate.ogm.datastore.port" value="27017"/>
<property name="hibernate.search.default.directory_provider" value="infinispan"/>
<property name="hibernate.search.default.worker.backend" value="jgroups"/>
<property name="hibernate.search.default.exclusive_index_use" value="false"/>
<property name="hibernate.search.lucene_version" value="LUCENE_CURRENT"/>
<property name="hibernate.search.default.optimizer.operation_limit.max" value="10000"/>
<property name="hibernate.search.default.optimizer.transaction_limit.max" value="1000"/>
<property name="hibernate.search.worker.execution" value="sync"/>
<property name="hibernate.search.reader.strategy" value="shared"/>
<property name="hibernate.search.infinispan.chunk_size" value="300000000"/>
<property name="wildfly.jpa.hibernate.search.module" value="none"/>
<property name="hibernate.search.infinispan.configuration_resourcename" value="infinispan-config.xml"/>
我的infinispan-config.xml
<cache-container name="hibernate-search" jndi-name="java:jboss/infinispan/container/hibernate-search">
<transport lock-timeout="330000"/>
<replicated-cache name="LuceneIndexesMetadata" mode="SYNC" remote-timeout="330000" >
<locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
<transaction mode="NONE"/>
<expiration max-idle="-1"/>
<state-transfer timeout="480000"/>
<persistence passivation="true">
<soft-index-file-store xmlns="urn:infinispan:config:store:soft-index:8.0" preload="true" fetch-state="true" >
<index path="/var/LuceneIndexesMetadata/index" />
<data path="/var/LuceneIndexesMetadata/data" />
<write-behind/>
</soft-index-file-store>
</persistence>
</replicated-cache>
<replicated-cache name="LuceneIndexesData" mode="SYNC" remote-timeout="25000">
<locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
<state-transfer timeout="480000"/>
<transaction mode="NONE"/>
<eviction strategy="LRU" max-entries="500"/>
<expiration max-idle="-1"/>
<persistence passivation="true">
<soft-index-file-store xmlns="urn:infinispan:config:store:soft-index:8.0" preload="true" fetch-state="true">
<index path="/var/LuceneIndexesData/index" />
<data sync-writes="true" path="/var/LuceneIndexesData/data" />
<write-behind/>
</soft-index-file-store>
</persistence>
</replicated-cache>
<replicated-cache name="LuceneIndexesLocking" mode="SYNC" remote-timeout="25000">
<locking striping="false" acquire-timeout="330000" concurrency-level="500"/>
<transaction mode="NONE"/>
<expiration max-idle="-1"/>
<state-transfer timeout="480000"/>
<persistence passivation="true">
<soft-index-file-store xmlns="urn:infinispan:config:store:soft-index:8.0" preload="true" fetch-state="true">
<index path="/var/LuceneIndexesLocking/index" />
<data path="/var/LuceneIndexesLocking/data" />
<write-behind/>
</soft-index-file-store>
</persistence>
</replicated-cache>
</cache-container>
我需要能够确保索引对于更大数量的3000万条记录来说没问题,在没有问题的情况下同步索引以防新的无状态节点启动并且能够在不重建整个索引的情况下重新启动(持久化)指数)。任何建议都可以接受我的代码中的可能架构和更改。
非常感谢。
Wildfly 10,Hibernate Search 5.6.1,来自BOM OGM 5.1的Infinispan 8.2.5
更新
这是我收到错误时VisualVM的图片: Java Heap Space
这是VisualVM生成的堆转储文件: heapdump