多次打开文件时,Lucene内存不足

时间:2018-06-28 18:13:13

标签: java indexing lucene out-of-memory

我的应用程序每秒收到多个请求,而我们的漫游器正在抓取我们的网站。我使用Lucene进行索引和搜索。对于站点重新启动时的第一个请求,应用程序打开Lucene索引文件并将其存储。因此,从第二个请求开始,它将查看存储的对象。 但是问题是直到文件完全打开并存储,然后有多个请求才会尝试再次打开文件。 这会导致网站在5-10分钟后耗尽内存。

这是以下错误。

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.TreeMap.put(Unknown Source)
    at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:61)
    at org.apache.lucene.codecs.lucene42.Lucene42FieldInfosReader.read(Lucene42FieldInfosReader.java:96)
    at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:121)
    at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:56)
    at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)
    at com.webjaguar.web.frontend.LuceneCategery.getLuceneProduct(LuceneCategery.java:166)
    at com.webjaguar.web.frontend.CategoryController.handleRequest(CategoryController.java:1034)
    at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:312)
    at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
    at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
    at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:324)
    at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)

第二个错误

   Exception in thread "Lucene Merge Thread #9" org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
    at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
Caused by: java.lang.OutOfMemoryError: Java heap space
java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit
    at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
    at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
    at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:981)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:883)
    at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:845)
    at com.webjaguar.thirdparty.lucene.LuceneProductIndexer.reIndex(LuceneProductIndexer.java:750)
    at com.webjaguar.web.quartz.LuceneProductJob.autoIndex(LuceneProductJob.java:90)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:273)
    at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:311)
    at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:113)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:223)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)

此行是错误行

reader = DirectoryReader.open(NIOFSDirectory.open(indexFile));

是否有一种方法可以锁定文件直到存储。可以改善实施方式的任何解决方案

1 个答案:

答案 0 :(得分:0)

您应该查看LockFactory的{​​{1}}(从父项NIOFSDirectory继承)。 参见LockFactory Javadoc for little more informations

除此之外,您的需求对我来说就像一个NRT(近实时)用例。如果您希望在短时间内进行索引和搜索,并且将连续进行索引,则可以使用NRT。我不确定这是否已经是Lucene v4.2的功能。 有关其他信息,请参见Simple NRT tutorial