下面是弹性搜索映射,其中一个字段称为hostname,另一个字段名为catch_all,它基本上是copy_to字段(还有更多字段将值复制到此字段)
{
"settings": {
"analysis": {
"filter": {
"myNGramFilter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 40
}},
"analyzer": {
"myNGramAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "myNGramFilter"]
}
}
}
},
"mappings": {
"test": {
"properties": {
"catch_all": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"store": true,
"ignore_above": 256
},
"grams": {
"type": "text",
"store": true,
"analyzer": "myNGramAnalyzer"
}
}
},
"hostname": {
"type": "text",
"copy_to": "catch_all"
}
}
}
}
}
当我这样做时
GET index/_analyze
{
"analyzer": "myNGramAnalyzer",
"text": "Dell PowerEdge R630"
}
{
"tokens": [
{
"token": "d",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "de",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "del",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "dell",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "p",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "po",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "pow",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "powe",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "power",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "powere",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "powered",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "poweredg",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "poweredge",
"start_offset": 5,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "r",
"start_offset": 15,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "r6",
"start_offset": 15,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "r63",
"start_offset": 15,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "r630",
"start_offset": 15,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
}
]
}
有一个名为&#34; poweredge&#34;的令牌。 现在我们使用以下查询
进行搜索{
"query": {
"multi_match": {
"fields": ["catch_all.grams"],
"query": "poweredge",
"operator": "and"
}
}
}
当我们使用&#34; poweredge&#34;我们得到1个结果。但是当我们只搜索&#34; edge&#34;没有结果。
即使匹配查询也不会产生搜索词&#34; edge&#34;
的结果有人可以帮忙吗?
答案 0 :(得分:0)
我建议您不要使用multi_match api查询您的用例,而是使用匹配查询。 edgengram以这种方式工作:它试图在你文本上由空格标记器生成的标记上创建ngram。正如文档中所述 - read here:
edge_ngram标记化程序首先将文本分解为单词 遇到一个指定字符列表,然后发出 每个单词的N-gram,其中N-gram的开始被锚定到 这个词的开头。
正如您在查询中测试过分析API一样,它没有产品&#34; edge&#34; - 来自poweredge - 作为ngram,因为它从单词的开头产生ngram - 看你分析API调用的输出。看看这里:https://www.elastic.co/guide/en/elasticsearch/guide/master/ngrams-compound-words.html