我的目的是搜索针对多个字段的短语。
{
"multi_match" : {
"query" : "king of baro",
"fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
"type" : "phrase_prefix",
"boost" : 10.0,
"tie_breaker" : 0.0
}
}
以上查询返回" baroda"它按预期工作。
但是,当我搜索"酒吧之王"时,它并没有返回任何内容。
{
"multi_match" : {
"query" : "king of bar",
"fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
"type" : "phrase_prefix",
"boost" : 10.0,
"tie_breaker" : 0.0
}
}
概要,
Search for "king of bar" - No result
Search for "king of baro" - returns "king of baroda"
Search for "king of baroda" - returns "king of baroda"
我缺少任何配置吗?
映射文件: -
http://localhost:9200/sec/_mapping/
{
"sec":{
"mappings":{
"sec":{
"properties":{
"filed1":{
"type":"string"
},
"filed2":{
"type":"string"
},
"filed3":{
"type":"string"
},
"filed4":{
"type":"string"
},
"filed5":{
"type":"string"
},
"filed6":{
"type":"string"
},
"filed7":{
"type":"string"
}
}
}
}
}
}
分析器,来自elasticsearch.yml
:
index:
analysis:
analyzer:
security_edge_ngram_analyzer:
alias: [security_edge_ngram_analyzer]
tokenizer: security_edge_ngram_tokenizer
tokenizer:
security_edge_ngram_tokenizer:
type: edgeNGram
答案 0 :(得分:2)
我的猜测是,您将edge ngram tokenizer配置为min_gram
设置为4
,但如果没有看到配置,很难确定。
以下是我this blog post Qbox中每个字段设置边缘ngram分析器的示例:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"edge_ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"edge_ngram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"index_analyzer": "edge_ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
}
答案 1 :(得分:1)
首先,我会仔细检查我的自定义分析器是否按预期工作。我这样做是为了使用fielddata_fields
:
GET sec/sec/_search
{
"fielddata_fields": ["filed1","field2","filed3","field4","filed5","field6","filed7"]
}
正确的edgeNGram
设置会产生如下输出:
"fields": {
"filed1": [
"ki",
"kin",
"king",
"king ",
"king o",
"king of",
"king of ",
"king of b",
"king of ba",
"king of bar",
"king of baro",
"king of barod",
"king of baroda"
]
}
如果你没有看到类似的东西,那我就看看分析仪是如何设置的,以及它的配置是否合适。作为检查这一点的第二种方法,我创建了一个简单的测试索引,我将自定义分析器直接设置在一个字段上并测试与上面相同:
PUT /sec
{
"mappings": {
"sec": {
"properties": {
"filed1": {
"type": "string",
"analyzer": "security_edge_ngram_analyzer"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"security_edge_ngram_analyzer": {
"tokenizer": "security_edge_ngram_tokenizer"
}
},
"tokenizer": {
"security_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20
}
}
}
}
}