分面(自动完成)内存不足

时间:2017-06-06 10:15:55

标签: solr faceted-search

由于OutOfMemory错误,我们的solr会不时崩溃。我们仍然在4.0.0版本上,但在我们解决以下问题之后计划迁移到最新版本。

当我查看tomcat日志时,我看到以下错误:

SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
    at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
    at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
    at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
    at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.lucene.search.FieldComparator$TermOrdValComparator.<init>(FieldComparator.java:1124)
    at org.apache.lucene.search.SortField.getComparator(SortField.java:425)
    at org.apache.lucene.search.FieldValueHitQueue$MultiComparatorsFieldValueHitQueue.<init>(FieldValueHitQueue.java:110)
    at org.apache.lucene.search.FieldValueHitQueue.create(FieldValueHitQueue.java:173)
    at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1123)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
    at si.amebis.termania.solr.ExternalSearch.search(ExternalSearch.java:307)
    at si.amebis.termania.solr.ExternalSearch.handleRequestBody(ExternalSearch.java:235)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
    ... 12 more

在请求自动填充字段之后(在您键入时建议)。请求详情如下:

q - *:*
start - 0
rows - 0
fq - (Type:1 OR Type:2)
facet - true
facet.limit - 20
facet.mincount - 1
facet.sort - true
facet.prefix - "mi"
facet.field - "Autocomplete"
-- 
which returns 8105170 hits

其中自动填充字段定义为:

<field name="Autocomplete" type="grams" indexed="true" stored="false" omitNorms="true" required="False" multiValued="true" />
    <fieldtype name="grams" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.ShingleFilterFactory" maxShingleSize="10" outputUnigrams="true" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
      </analyzer>
    </fieldtype>

索引详情:

Num document: 4338603
Index size: 10.1 Gb
Ram: 64Gb (-Xmx45000M)
Terms count in Autocomplete field: 70.459.723

我假设在文本字段上有刻字,所以很多术语需要大量内存。

如何计算内存需求量,是否有更有效的方式来提供自动完成功能(使用短语 - n-gram)?

提前致谢!

1 个答案:

答案 0 :(得分:0)

您是否可以连接到Solr实例以检查内存在哪里?我猜这是在FieldCache,但总是很好检查以确定,Solr的分面对待每个字段,所以你应该能够检查特定的内存消耗领域。要估计构面查询的内存使用情况,您可以检查此线程(http://lucene.472066.n3.nabble.com/Solr-using-a-ridiculous-amount-of-memory-td4050840.html

您的问题还有一些问题,您说您的查询会返回8105170次点击,但您的索引只有4338603个文档。通常在文本字段上进行分面是很有挑战性的,因为术语的数量可以非常快地增加,特别是如果你使用带状疱疹/ ngrams。

看看https://github.com/cominvent/autocomplete是Solr支持的自动完成功能的一个很好的起点(我已将此作为我的几个项目的起点)。

根据您实施自动填充功能的方式,您还可以尝试更改facet.methodhttps://cwiki.apache.org/confluence/display/solr/Faceting )参数并检查它是否有帮助。

另请查看https://cwiki.apache.org/confluence/display/solr/Suggester