Question

尝试从Solr 4.3.0升级到Solr 4.4.0时，遇到了这个异常：

 java.lang.IllegalArgumentException: enablePositionIncrements=false is not supported anymore as of Lucene 4.4 as it can create broken token streams

导致我this issue。我需要能够匹配查询，而不管插入的停用词（曾经使用enablePositionIncrements =“true”）。例如：“条形图的foo”会找到与“foo bar”，“foo of bar”和“foo of the bar”匹配的文档。在4.4.0中不推荐使用此选项，我不清楚如何保持相同的功能。

package javadoc添加：

如果所选分析器过滤停用词“是”和“该”，那么对于包含字符串“蓝色是天空”的文档，只有标记“蓝色”，“天空”被索引，位置（“天空“）= 3 +位置（”蓝色“）。现在，短语查询“蓝色是天空”会找到该文档，因为同一个分析器会从该查询中过滤相同的停用词。但是短语查询“蓝天”将找不到该文档，因为“蓝色”和“天空”之间的位置增量仅为1.

如果此行为不符合应用程序需求，则需要将查询解析器配置为在生成短语查询时不考虑位置增量。

但是没有提到如何实际配置查询解析器来执行此操作。当Solr走向5.0时，有谁知道如何处理这个问题？

Answer 1

您可以使用邻近搜索：

"foo bar"~2

Answer 2

我不知道是否建议使用它，但Lucene 5中仍有一些遗留类，例如Lucene43StopFilter。

不幸的是，他们似乎已经在Lucene 6中消失了......

Answer 3

我在RemoveTokenGapsFilterFactory

的网络实现上找到了某个地方

public final class RemoveTokenGapsFilter extends TokenFilter {

    private final PositionIncrementAttribute posIncrAttribute = addAttribute(PositionIncrementAttribute.class);

    public RemoveTokenGapsFilter(TokenStream input) {
        super(input);
    }

    @Override
    public boolean incrementToken() throws IOException {

        if (input.incrementToken()) {
            posIncrAttribute.setPositionIncrement(1);
            return true;
        }

        return false;
    }
}

Solr 4.4：StopFilterFactory和enablePositionIncrements

3 个答案: