Question

我使用弹性搜索来索引包含两个字段的实体：agencyName和agencyAddress。

假设我已将一个实体编入索引：

{
    "agencyName": "Turismo Viajes",
    "agencyAddress": "Av. Maipú 500"
}

我希望能够搜索此实体，并通过agencyName获取上面的实体。不同的搜索可能是：

1）urismo 2）Viaje 3）Viajes 4）Turismo 5）uris

我的想法是，如果我查询这些字符串，我应该总是得到那个实体（可能根据它的准确程度而得分不同）。

为此，我认为nGram会解决，所以我在我的弹性search.yml文件中定义了一个名为phrase的全局分析器。

index:
  analysis:
    analyzer:
      phrase:
        type: custom
        tokenizer: nGram
        filter: [nGram, lowercase, asciifolding]

我创建了这样的代理商索引：

{
  "possible_clients" : {
    "possible_client" : {
      "properties" : {
        "agencyName" : {
          "type" : "string",
          "analyzer" : "phrase"
        },
        "agencyAddress" : {
          "type": "string"
        }
}

问题在于，当打这样的电话时：

curl -XPOST 'http://localhost:9200/possible_clients/possible_client/_search' -d '{
    "query": { "term": { "agencyName": "uris" }}
}'

我没有得到任何点击。我有什么想法吗？

提前致谢。

Answer 1

您正在使用字词查询进行搜索。术语查询始终未被分析。因此更换分析仪不会产生任何影响。您应该使用例如匹配查询。

Answer 2

根据文档，你的tokenizer的max_gram的默认值是2.所以，你索引tu，ur，ri，is，sm，mo等等。
术语过滤器不会分析您的输入，因此，您正在搜索uris，而uris从未被编入索引。

尝试设置max_gram。：

ngram tokenizer ngram tokenfilter

也许你不应该同时使用ngram tokenizer和ngram过滤器。我总是只使用过滤器。（因为标记化器是空白的）

这里是我们必须在这里定义的edgengram过滤器。 Ngrams应该是一样的。

"filter" : {    
"my_filter" : {
    "type" : "edgeNGram",
    "min_gram" : "1",
    "max_gram" : "20"
}
}

希望它有所帮助。

ES搜索部分单词 - ngram？

2 个答案: