Question

我正在尝试实施由ES索引驱动的自动建议控件。索引有多个字段，我希望能够使用AND运算符跨多个字段进行查询，并允许部分匹配（仅限前缀）。

举个例子，假设我有两个我要查询的字段：＆＃34; color＆＃34;和＆＃34;动物＆＃34;。我希望能够完成诸如＆＃34; duc＆＃34;，＆＃34; duck＆＃34;，＆＃34; purpl＆＃34;，＆＃34; purple＆＃34;，＆＃34;紫鸭＆＃34;。我设法使用带有AND运算符的multi_match（）来完成所有这些工作。

我似乎无法做的是匹配＆＃34;紫色duc＆＃34;等查询，因为multi_match不允许使用通配符。

我已经查看了match_phrase_prefix（），但据我了解，它并不跨越多个字段。

我转向执行一个令牌者：感觉解决方案可能就在那里，所以最终的问题是：

1）有人可以确认没有开箱即用的功能来做我想做的事情吗？感觉就像一个足够普通的模式，可以随时使用。

2）有人可以建议任何解决方案吗？令牌化器是解决方案的一部分吗？我非常乐意指出正确的方向并自己做更多的研究。显然，如果某人有工作解决方案来分享这将是非常棒的。

提前致谢 - F

Answer 1

我实际上写了一篇关于Qbox的博客文章，你可以在这里找到：http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams。（不幸的是，帖子上的一些链接被破坏了，此时无法轻易修复，但希望你能得到这个想法。）

我会将您推荐给帖子了解详细信息，但这里有一些代码可以用来快速测试。请注意，我使用的是edge ngrams而不是ngrams。

还要特别注意使用_all field和match query operator。

好的，这是映射：

PUT /test_index
{
   "settings": {
      "analysis": {
         "filter": {
            "edgeNGram_filter": {
               "type": "edgeNGram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "edgeNGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "edgeNGram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "_all": {
            "enabled": true,
            "index_analyzer": "edgeNGram_analyzer",
            "search_analyzer": "standard"
         },
         "properties": {
            "field1": {
               "type": "string",
               "include_in_all": true
            },
            "field2": {
               "type": "string",
               "include_in_all": true
            }
         }
      }
   }
}

现在添加几个文档：

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"field1":"purple duck","field2":"brown fox"}
{"index":{"_id":2}}
{"field1":"slow purple duck","field2":"quick brown fox"}
{"index":{"_id":3}}
{"field1":"red turtle","field2":"quick rabbit"}

这个查询似乎说明了你想要的东西：

POST /test_index/_search
{
   "query": {
      "match": {
         "_all": {
             "query": "purp fo slo",
             "operator": "and"
         }
      }
   }
}

返回：

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.19930676,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.19930676,
            "_source": {
               "field1": "slow purple duck",
               "field2": "quick brown fox"
            }
         }
      ]
   }
}

以下是我用来测试它的代码：

http://sense.qbox.io/gist/b87e426062f453d946d643c7fa3d5480cd8e26ec

Elasticsearch：跨多个字段查询多个单词（带前缀）

1 个答案: