ElasticSearch一个edgeNGram用于自动完成\ typeahead,是我的search_analyzer被忽略

时间:2013-05-07 05:43:26

标签: java elasticsearch

我有三个带有“userName”字段的文档:

  • 'briandilley'
  • 'briangumble'
  • 'briangriffen'

当我搜索'brian'时,我按预期得到了所有三个,但是当我搜索'briandilley'时,我仍然得到了所有三个回来。 analyze API告诉我它在我的搜索字符串上使用了ngram过滤器,但我不确定原因。这是我的设置:

索引设置:

{
    "analysis": {
        "analyzer": {
            "username_index": {
                "tokenizer": "keyword",
                "filter": ["lowercase", "username_ngram"]
            },
            "username_search": {
                "tokenizer": "keyword",
                "filter": ["lowercase"]
            }
        },
        "filter": {
            "username_ngram": {
                "type": "edgeNGram",
                "side" : "front",
                "min_gram": 1,
                "max_gram": 15
            }
        }
    }
}

映射:

{
    "user_follow": {

        "properties": {
            "targetId": { "type": "string", "store": true },
            "followerId": { "type": "string", "store": true },
            "dateUpdated": { "type": "date", "store": true },

            "userName": {
                "type": "multi_field",
                "fields": {
                    "userName": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "autocomplete": {
                        "type": "string",
                        "index_analyzer": "username_index",
                        "search_analyzer": "username_search"
                    }
                }
            }
        }
    }
}

搜索:

{
    "from" : 0,
    "size" : 50,
    "query" : {
        "bool" : {
            "must" : [ {
                "field" : {
                    "targetId" : "51888c1b04a6a214e26a4009"
                }
            }, {
                "match" : {
                    "userName.autocomplete" : {
                        "query" : "brian",
                        "type" : "boolean"
                    }
                }
            } ]
        }
    },
    "fields" : "followerId"
}

我尝试过matchQuery,matchPhraseQuery,textQuery和termQuery(java DSL api),每次都得到相同的结果。

1 个答案:

答案 0 :(得分:9)

我认为你并没有完全按照自己的想法行事。这就是为什么提供一个包含完整curl语句的实际测试用例,而不是缩写它是有用的。

上面的例子对我有用(稍加修改):

使用设置和映射创建索引:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
  "mappings" : {
     "test" : {
        "properties" : {
           "userName" : {
              "fields" : {
                 "autocomplete" : {
                    "search_analyzer" : "username_search",
                    "index_analyzer" : "username_index",
                    "type" : "string"
                 },
                 "userName" : {
                    "index" : "not_analyzed",
                    "type" : "string"
                 }
              },
              "type" : "multi_field"
           }
        }
     }
  },
  "settings" : {
     "analysis" : {
        "filter" : {
           "username_ngram" : {
              "max_gram" : 15,
              "min_gram" : 1,
              "type" : "edge_ngram"
           }
        },
        "analyzer" : {
           "username_index" : {
              "filter" : [
                 "lowercase",
                 "username_ngram"
              ],
              "tokenizer" : "keyword"
           },
           "username_search" : {
              "filter" : [
                 "lowercase"
              ],
              "tokenizer" : "keyword"
           }
        }
     }
  }
}
'

索引一些数据:

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '{
  "userName" : "briangriffen"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
  "userName" : "brianlilley"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
  "userName" : "briangumble"
}
'

搜索brian查找所有文档:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{
  "query" : {
     "match" : {
        "userName.autocomplete" : "brian"
     }
  }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "userName" : "briangriffen"
#             },
#             "_score" : 0.1486337,
#             "_index" : "test",
#             "_id" : "AWzezvEFRIykOAr75QbtcQ",
#             "_type" : "test"
#          },
#          {
#             "_source" : {
#                "userName" : "briangumble"
#             },
#             "_score" : 0.1486337,
#             "_index" : "test",
#             "_id" : "qIABuMOiTyuxLOiFOzcURg",
#             "_type" : "test"
#          },
#          {
#             "_source" : {
#                "userName" : "brianlilley"
#             },
#             "_score" : 0.076713204,
#             "_index" : "test",
#             "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.1486337,
#       "total" : 3
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 8
# }

搜索brianlilley只找到该文档:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
  "query" : {
     "match" : {
        "userName.autocomplete" : "brianlilley"
     }
  }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "userName" : "brianlilley"
#             },
#             "_score" : 0.076713204,
#             "_index" : "test",
#             "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.076713204,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 4
# }