Question

我正在使用SolR cloud 6.5.0安装。我的目标是检索与我的搜索字词同时出现的所有字词，按计数排名，然后取前N字。为此，我定义了一个text_en_facets类型的字段，该字段定义了一个带有PatternTokenizer的TextField以及其他一些东西（帖子末尾的完整定义）。

现在我的实例包含了一些数据：该字段包含1.3M唯一术语，因此，我收到以下错误：

o.a.s.s.FastLRUCache Error during auto-warming of key:payload_en_facets:org.apache.solr.common.SolrException: java.lang.IllegalStateException: Too many values for UnInvertedField faceting on field…

我注意到other people had the same issue，我想知道是否有关于最佳做法的新闻和/或绕过这种限制的方法。如果我不必手动重新索引数据或手动分析我的文档以使用StrField，那就太棒了。

我已经尝试了facet.method，facet.limit和facet.mincount的不同配置，但这并没有解决问题。还有其他想法吗？

    <fieldType name="text_en_facets" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <!-- recognises e-mail addresses, urls, #-tags and @-mentions, alphanumeric words (possibly containing inner periods) -->
        <tokenizer class="solr.PatternTokenizerFactory"
                   pattern="(?U)([\w-\.]+@[\w-\.]+)|(https?:\S+)|((\s|^)[@#]\w+)|(\w+(\.\w+)?)" group="0"/>
        <!-- there might be tokens containing trailing/leading white spaces -->
        <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" format="snowball"
                words="stopwords/stopwords_en.txt,stopwords/stopwords_en_nltk.txt,stopwords/stopwords_en_twitter.txt"
                ignoreCase="true"/>
        <!-- kills urls -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="(?U)https?:\S+" replacement=""/>
        <!-- kills numbers -->
        <filter class="solr.PatternReplaceFilterFactory" pattern="(?U)^[0-9.,']+$" replacement=""/>
        <!-- kills meaningless tokens  -->
        <filter class="solr.LengthFilterFactory" min="2" max="1024"/>
    </analyzer>
</fieldType>

Answer 1

这是在文本字段上进行分面时使用的内部结构的限制。

应该可以用facet.method=enum规避它，在这种情况下这将非常慢
您可能会尝试将索引拆分为很多碎片，但这种工作的可能性取决于您的索引的术语 - 分布。此外，它可能会降低性能

我找到了问题并编写了一个修补程序（代码位于https://github.com/tokee/lucene-solr/tree/uninvert-optimize），但这对您目前没有帮助。我正在将其纳入Solr，因此请查看https://issues.apache.org/jira/browse/SOLR-11240以获取更新。

更新20170824：@Alberto我已将补丁添加到Solr，但由于时间问题，它不会成为即将发布的6.6.1和7.0版本的一部分。如果您现在需要它，我相当确定SOLR-11240的补丁适用于Solr 6.5+源代码。

20171017更新：@Alberto该修复程序是今天早些时候发布的Solr 7.1的一部分。如果您愿意升级，这应该可以解决您的问题。

SolR：TextField上的faceting

1 个答案: