edge_ngram过滤器而不是analzyed匹配搜索

时间:2015-04-03 17:21:10

标签: elasticsearch

我有以下弹性搜索配置:

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                },
                "snow_filter" : {
                    "type" : "snowball",
                    "language" : "English"
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "snow_filter",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

PUT /my_index/_mapping/my_type
{
    "my_type": {
        "properties": {
            "name": {
                "type": "multi_field",
                "fields": {
                    "name": {
                        "type":            "string",
                        "index_analyzer":  "autocomplete", 
                        "search_analyzer": "snowball"
                    },
                    "not": {
                        "type": "string",
                        "index": "not_analyzed"
                    }
                }
            }
        }
    }
}


POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "Brown foxes"    }
{ "index": { "_id": 2            }}
{ "name": "Yellow furballs" }
{ "index": { "_id": 3            }}
{ "name": "my discovery" }
{ "index": { "_id": 4            }}
{ "name": "myself is fun" }
{ "index": { "_id": 5            }}
{ "name": ["foxy", "foo"]    }
{ "index": { "_id": 6            }}
{ "name": ["foo bar", "baz"] }

我试图搜索只返回名为“foo bar”的第6项,我不太确定如何。这就是我现在正在做的事情:

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": {
                "query":    "foo b"
            }
        }
    }
}

我知道这是令牌分子如何分裂这个词的组合,但却失去了两者如何灵活和严格到足以匹配这一点。我猜我需要在我的名字映射上做一个多字段,但我不确定。如何修复查询和/或映射以满足我的需求?

1 个答案:

答案 0 :(得分:1)

你已经很亲密了。由于您的edge_ngram分析器会生成最小长度为1的令牌,并且您的查询会被标记为"foo""b",而默认match query operator"or",您的查询会匹配每个文档,这些文档的术语以"b"(或"foo"),三个文档开头。

使用"and"运算符似乎可以执行您想要的操作:

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": {
                "query":    "foo b",
                "operator": "and"
            }
        }
    }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1.4451914,
      "hits": [
         {
            "_index": "test_index",
            "_type": "my_type",
            "_id": "6",
            "_score": 1.4451914,
            "_source": {
               "name": [
                  "foo bar",
                  "baz"
               ]
            }
         }
      ]
   }
}

以下是我用来测试它的代码:

http://sense.qbox.io/gist/4f6fb7c1fdc6942023091ee1433d7490e04e7dea