elasticsearch edgengram copy_to field partial search not working

时间:2018-05-29 10:06:14

标签: elasticsearch full-text-search n-gram

下面是弹性搜索映射,其中一个字段称为hostname,另一个字段名为catch_all,它基本上是copy_to字段(还有更多字段将值复制到此字段)

{
  "settings": {
    "analysis": {
            "filter": {
                "myNGramFilter": {
                  "type": "edgeNGram",
                  "min_gram": 1,
                  "max_gram": 40
            }},
            "analyzer": {
                "myNGramAnalyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "myNGramFilter"]
                }
            }
        }
  },
    "mappings": {
      "test": {
        "properties": {
          "catch_all": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "store": true,
                            "ignore_above": 256
                        },
                        "grams": {
                            "type": "text",
                            "store": true,
                            "analyzer": "myNGramAnalyzer"
                        }
                    }
          },
          "hostname": {
            "type": "text",
            "copy_to": "catch_all"
          }
        }
      }
    }
}

当我这样做时

GET index/_analyze
{
  "analyzer": "myNGramAnalyzer",
  "text": "Dell PowerEdge R630"
}
{
  "tokens": [
    {
      "token": "d",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "de",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "del",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "dell",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "p",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "po",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "pow",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "powe",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "power",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "powere",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "powered",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "poweredg",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "poweredge",
      "start_offset": 5,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "r",
      "start_offset": 15,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "r6",
      "start_offset": 15,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "r63",
      "start_offset": 15,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "r630",
      "start_offset": 15,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

有一个名为&#34; poweredge&#34;的令牌。 现在我们使用以下查询

进行搜索
{ 
  "query": {
    "multi_match": {
      "fields": ["catch_all.grams"],
      "query": "poweredge",
      "operator": "and"
    }
  }
}

当我们使用&#34; poweredge&#34;我们得到1个结果。但是当我们只搜索&#34; edge&#34;没有结果。

即使匹配查询也不会产生搜索词&#34; edge&#34;

的结果

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

我建议您不要使用multi_match api查询您的用例,而是使用匹配查询。 edgengram以这种方式工作:它试图在你文本上由空格标记器生成的标记上创建ngram。正如文档中所述 - read here

  

edge_ngram标记化程序首先将文本分解为单词   遇到一个指定字符列表,然后发出   每个单词的N-gram,其中N-gram的开始被锚定到   这个词的开头。

正如您在查询中测试过分析API一样,它没有产品&#34; edge&#34; - 来自poweredge - 作为ngram,因为它从单词的开头产生ngram - 看你分析API调用的输出。看看这里:https://www.elastic.co/guide/en/elasticsearch/guide/master/ngrams-compound-words.html