我有999个文件用于试验弹性搜索。
我的类型映射中有一个字段f4,它被分析并具有以下分析器设置:
"myNGramAnalyzer" => [
"type" => "custom",
"char_filter" => ["html_strip"],
"tokenizer" => "standard",
"filter" => ["lowercase","standard","asciifolding","stop","snowball","ngram_filter"]
]
我的过滤器如下:
"filter" => [
"ngram_filter" => [
"type" => "edgeNGram",
"min_gram" => "2",
"max_gram" => "20"
]
]
我对字段f4有价值" Proj1"," Proj2"," Proj3" ......等等。
现在,当我尝试使用交叉字段进行搜索时," proj1"字符串,我期待与" Proj1"以最高分数在响应的顶部返回。但它并没有。休息所有数据的内容几乎相同。
我也不明白为什么它与所有999文件相符?
以下是我的搜索:
{
"index": "myindex",
"type": "mytype",
"body": {
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
}
我的搜索回复是:
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 999,
"max_score": 1,
"hits": [{
"_index": "myindex",
"_type": "mytype",
"_id": "42",
"_score": 1,
"_source": {
"f1": "396","f2": "125650","f3": "BH.1511AI.001",
"f4": "Proj42",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
}, {
"_index": "myindex",
"_type": "mytype",
"_id": "47",
"_score": 1,
"_source": {
"f1": "396","f2": "137946","f3": "BH.152096.001",
"f4": "Proj47",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
},
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
{
"_index": myindex,
"_type": "mytype",
"_id": "1",
"_score": 1,
"_source": {
"f1": "396","f2": "142095","f3": "BH.705215.001",
"f4": "Proj1",
"f5": "BH.1511AI.001","f6": "","f7": "","f8": "","f9": "","f10": "","f11": "","f12": "","f13": "","f14": "","f15": "","f16": "09/05/16 | 01:02PM | User","deleted": "0"
}
//.......
//.......
//MANY RECORDS IN BETWEEN HERE
//.......
//.......
}]
}
}
我做错了什么或错过了什么? (对于冗长的问题道歉,但我想尽可能地丢弃不必要的其他代码)。
已编辑:
术语向量响应
{
"_index": "myindex",
"_type": "mytype",
"_id": "10",
"_version": 1,
"found": true,
"took": 9,
"term_vectors": {
"f4": {
"field_statistics": {
"sum_doc_freq": 5886,
"doc_count": 999,
"sum_ttf": 5886
},
"terms": {
"pr": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"pro": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj": {
"doc_freq": 999,
"ttf": 999,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj1": {
"doc_freq": 111,
"ttf": 111,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
},
"proj10": {
"doc_freq": 11,
"ttf": 11,
"term_freq": 1,
"tokens": [{
"position": 0,
"start_offset": 0,
"end_offset": 6
}]
}
}
}
}
}
已编辑2
字段f4的映射
"f4" : {
"type" : "string",
"index_analyzer" : "myNGramAnalyzer",
"search_analyzer" : "standard"
}
我已更新使用标准分析器查询时间,这改善了结果但仍然不符合我的预期。
而不是999(所有文件)现在它返回111个文件,如" Proj1"," Proj11"," Proj111" ......&# 34; Proj1"," Proj181" .........等。
仍然" Proj1"在结果之间而不在顶部。
答案 0 :(得分:1)
没有index_analyzer
(至少不是Elasticsearch
版本1.7)。对于mapping parameters,您可以使用analyzer
和search_analyzer
。
请尝试以下步骤以使其正常工作。
使用分析器设置创建myindex:
PUT /myindex
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"myNGramAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": "html_strip",
"filter": [
"lowercase",
"standard",
"asciifolding",
"stop",
"snowball",
"ngram_filter"
]
}
}
}
}
}
向mytype添加映射(为了简化我只是映射了相关的字段):
PUT /myindex/_mapping/mytype
{
"properties": {
"f1": {
"type": "string"
},
"f4": {
"type": "string",
"analyzer": "myNGramAnalyzer",
"search_analyzer": "standard"
},
"deleted": {
"type": "string"
}
}
}
索引一些数据:
PUT myindex/mytype/1
{
"f1":"396",
"f4":"Proj12" ,
"deleted": "0"
}
PUT myindex/mytype/2
{
"f1":"42",
"f4":"Proj22" ,
"deleted": "1"
}
现在尝试查询:
GET myindex/mytype/_search
{
"query": {
"multi_match": {
"query": "proj1",
"type": "cross_fields",
"operator": "and",
"fields": "f*"
}
},
"filter": {
"term": {
"deleted": "0"
}
}
}
它应该返回文档#1
。 Sense
对我有用。我使用的是Elasticsearch 2.X
个版本。
希望我能帮助:)
答案 1 :(得分:0)
经过几个小时的花时间寻找解决方案,我终于成功了。
所以我保持一切与我的问题中提到的一样,使用n gram analzyer同时索引数据。我唯一需要更改的是,将我的搜索查询中的all
字段用作现有multi-match
查询的bool查询。
现在我的搜索结果Proj1
的结果会以Proj1
,Proj121
,Proj11
等订单返回结果。
虽然这不会返回Proj1
,Proj11
,Proj121
等确切的顺序,但它仍然非常类似于我想要的结果。