Question

我和ES一起工作了一个多月。我正在寻找与位置保持子串匹配相关的知识。

假设我已将文档编入索引以进行弹性搜索。带有＆＃34; doc_field＆＃34;的2个文档：带有id1和id2的文档。

id1: " Once when a big Lion was asleep, a little Mouse began running up and down upon him. "
id2: " The mouse is very little"

我不知道我是否应该保留索引＆＃34; not_analyzed＆＃34;或＆＃34;分析＆＃34;。

我很好奇的是，如果我执行以下一组查询，它将能够给我正确的匹配。

query = { "query":
           "match":{"document":"little mouse","operator": and }}

我希望它只返回那些有＆＃34;小老鼠＆＃34;的文件。它不应该返回在其他部分有很少或鼠标的文档。简单地说，应该保留查询中单词的排列。帮助

Answer 1

查看shingles TokenFilter（documentation）。它与ngram非常相似，但使用标记而不是字符。

使用默认设置，它会生成两个字的长令牌。您可以使用_analyze API检查其行为：

POST _analyze?tokenizer=whitespace&filters=shingle&text=The mouse is very little

将输出：

{
   "tokens": [
      {
         "token": "The",
         "start_offset": 0,
         "end_offset": 3,
         "type": "word",
         "position": 1
      },
      {
         "token": "The mouse",
         "start_offset": 0,
         "end_offset": 9,
         "type": "shingle",
         "position": 1
      },
      {
         "token": "mouse",
         "start_offset": 4,
         "end_offset": 9,
         "type": "word",
         "position": 2
      },
      {
         "token": "mouse is",
         "start_offset": 4,
         "end_offset": 12,
         "type": "shingle",
         "position": 2
      },
      {
         "token": "is",
         "start_offset": 10,
         "end_offset": 12,
         "type": "word",
         "position": 3
      },
      {
         "token": "is very",
         "start_offset": 10,
         "end_offset": 17,
         "type": "shingle",
         "position": 3
      },
      {
         "token": "very",
         "start_offset": 13,
         "end_offset": 17,
         "type": "word",
         "position": 4
      },
      {
         "token": "very little",
         "start_offset": 13,
         "end_offset": 24,
         "type": "shingle",
         "position": 4
      },
      {
         "token": "little",
         "start_offset": 18,
         "end_offset": 24,
         "type": "word",
         "position": 5
      }
   ]
}

然后，通过查询此字段，您将看到两个示例文档之间的差异。

您可以在权威指南的this section中找到有关邻近搜索的详细说明。

弹性搜索子字符串匹配不变的位置

1 个答案: