Question

我已经找了一个星期的工作解决方案，这将允许以下内容：

文件：[短语：“猫”]，[短语：“猫猫”]，[短语：“猫”]

搜索查询：“cat”=＆gt;结果：“猫”，“猫”（但不是“猫猫”）

搜索查询：“cats”=＆gt;结果：“猫”，“猫”（但不是“猫猫”）

我在网上看到了一些关于如何实现这一目标的建议。 Somewhere我看到了建议在索引时在字段值的开头和结尾插入标记标记，然后执行包含这些标记标记的“短语查询”。 In other place我看到了计算每个文档中唯一术语数量的建议。

我发现第二个建议（计算单词）是一个相当复杂的问题，我无法识别如何使用第一个建议。

所以问题是你能否提示如何在Solr中实现“关于请求的单词编号和使用词干（单词形式）的完全匹配？”

任何想法都会非常感激。

Answer 1

我已解决了以下问题（带有前缀和后缀）：

在solrconfig.xml中：

<updateRequestProcessorChain name="exact"> 
    <processor class="solr.CloneFieldUpdateProcessorFactory">
        <str name="source">phrase</str>
        <str name="dest">phraseExact</str>
    </processor>
    <processor class="solr.RegexReplaceProcessorFactory">
        <str name="fieldName">phraseExact</str>
        <str name="pattern">^(.*)$</str>
        <str name="replacement">_prefix_ $1 _suffix_</str>
        <bool name="literalReplacement">false</bool>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<!-- other contents of solrconfig.xml... -->
<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
        <str name="update.chain">exact</str>
    </lst>
</requestHandler>

schema.xml中的

：

<field name="phrase" type="text_en" indexed="true" stored="true"/>
<field name="phraseExact" type="text_en" indexed="true" stored="true"/>

更改后需要重新启动Solr实例，然后重新索引（重新添加）所有文档。

现在我们有这样的文件：

{
    "phrase": "test",
    "id": "9c95fac2ed78149c",
    "phraseExact": "_prefix_ test _suffix_",
    "_version_": 1471599816879374300
 },
 {
    "phrase": "test phrase",
    "id": "9c95fac2ed78123c",
    "phraseExact": "_prefix_ test phrase _suffix_",
    "_version_": 1471599816123474300
 },

如果通过

等查询搜索文档

"q=phraseExact:"_prefix_ test _suffix_"
"q=phraseExact:"_prefix_ testing _suffix_"
"q=phraseExact:"_prefix_ tests _suffix_"

我们只会收到{“词组”：“测试”}文件（而不是{“词组”：“测试词组”}）

关于单词编号的Solr完全匹配

1 个答案: