使用ElasticSearch(和Rails)我尝试使用连字符作为分隔符在包含mac地址的字段上索引和执行搜索查询失败:
24 A4-3C-02-37-26
搜索整个mac地址(未编入索引)时一切都很顺利,但我无法使用自定义分析器进行零件匹配。
我已经测试了许多选项,包括调整最小/最大值但没有成功。
使用下面的映射,设置和查询,我得到以下结果:
Box.search(q: "24-A4-3C-02-37-26").results.map(&:macaddress)
这产生了一个奇怪的结果:
["24-A4-3C-02-37-xx", "DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-54-xx", "C4-10-8A-13-52-xx"]
如果我删除了最后一个八位字节(" 24-A4-3C-02-37"),我明白了:
["DC-9F-DB-F6-B2-xx", "C4-10-8A-13-53-xx", "C4-10-8A-13-52-xx"]
哪个错了。
我已经使用API检查了分析仪,看起来只是膨胀:
curl "localhost:9205/boxes/_analyze?analyzer=ngram_analyzer&pretty=true" -d "24-A4-3C-02-37-26"
哪个收益率:
{
"tokens" : [ {
"token" : "24",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "24-",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 2
}, {
"token" : "24-A",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 3
}, {
.........
所以我只能猜测我的实际查询有什么东西。我甚至尝试用ascii替换连字符或逃避。
@search_definition[:query] = {
multi_match: {
query: options[:q],
fields: [
"macaddress.ngram",
"macaddress.sortable^5",
...
我的设置如下:
settings analysis: {
analyzer: {
ngram_analyzer: {
type: 'custom',
tokenizer: 'my_tokenizer',
}
},
tokenizer: {
my_tokenizer: {
type: "edgeNGram",
min_gram: 2,
max_gram: 17,
# token_chars: [ "letter", "digit" ]
}
}
} do
mapping do
indexes :macaddress, type: 'multi_field', fields: {
raw: { type: "string" },
sortable: { type: "string", index: "not_analyzed" },
ngram: { type: "string", index_analyzer: :ngram_analyzer } #, search_analyzer: 'keyword' }
}
end
end
有人可以建议我如何让它发挥作用吗?
答案 0 :(得分:1)
我已通过以下设置验证:
PUT test
{
"settings" : {
"analysis" : {
"analyzer" : {
"ngram_analyzer" : {
"type": "custom",
"tokenizer" : "my_tokenizer"
}
},
"tokenizer" : {
"my_tokenizer" : {
"type" : "edgeNGram",
"min_gram" : "2",
"max_gram" : "17"
}
}
}
},
"mappings": {
"boxes":{
"properties": {
"macaddress":{
"type": "multi_field",
"fields": {
"raw":{
"type": "string"
},
"sortable":{
"type": "string",
"index": "not_analyzed"
},
"ngram":{
"type": "string",
"index_analyzer": "ngram_analyzer"
}
}
}
}
}
}
}
以及一些示例数据:
PUT test/boxes/1
{
"macaddress":"24-A4-3C-02-37-26"
}
PUT test/boxes/2
{
"macaddress":"24-A4-3C-02-37-54"
}
PUT test/boxes/3
{
"macaddress":"24-A4-3C-02-38-23"
}
PUT test/boxes/4
{
"macaddress":"34-A4-3C-02-38-23"
}
搜索查询:
GET test/boxes/_search
{
"query": {
"multi_match": {
"query": "24-A4-3C-02",
"fields": ["macaddress.ngram",
"macaddress.sortable^5"]
}
}
}
结果是:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.047079325,
"hits": [
{
"_index": "test",
"_type": "boxes",
"_id": "1",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-37-26"
}
},
{
"_index": "test",
"_type": "boxes",
"_id": "2",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-37-54"
}
},
{
"_index": "test",
"_type": "boxes",
"_id": "3",
"_score": 0.047079325,
"_source": {
"macaddress": "24-A4-3C-02-38-23"
}
}
]
}
}