我们在freebsd 11上使用elasticsearch。 表中有很多数据ipv4,ipv6格式。
客户希望使用通配符进行搜索。 例如
*192* -> no problem
*192.168.* -> no problem
*2001:db8* -> take error
*2001\:db8 -> take error....
我没有从Elastic中获取正确的数据。尤其是“:”是个大问题。
我的系统信息和查询结果是
此弹性信息
{
"name": "WxaxEg6",
"cluster_name": "elasticsearch",
"cluster_uuid": "o-7IPcD3RjODelTyPYUBJw",
"version": {
"number": "5.6.8",
"build_hash": "688ecce",
"build_date": "2018-02-16T16:46:30.010Z",
"build_snapshot": false,
"lucene_version": "6.6.1"
},
"tagline": "You Know, for Search"
}
我的测试表是
{
"ip_test2": {
"aliases": {},
"mappings": {
"doc": {
"properties": {
"ip_addr": {
"type": "text"
}
}
}
},
"settings": {
"index": {
"creation_date": "1549119687946",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "Aljv_81nQDKx3B3Fs2nVOA",
"version": {
"created": "5060899"
},
"provided_name": "ip_test2"
}
}
}
}
查询1:
{
"query": {
"query_string" : {
"fields" : ["ip_addr"],
"query": "*192.*",
"analyze_wildcard": true
}
}
}'
结果是
{"took":3,"timed_out":false,"_shards":
{"total":5,"successful":5,"skipped":0,"failed":0},"hits":
{"total":255,"max_score":1.0,"hits":
...:{"ip_addr": "192.168.1.4"}},
没问题。
查询2
"query": "*2001*",
结果是
{"took":5,"timed_out":false,"_shards":
{"total":5,"successful":5,"skipped":0,"failed":0},"hits":
{"total":100,"max_score":1.0,"hits":
...:{"ip_addr": "2001:db8:100:0:2359:8a17:17c6:e316"}},
没问题。 现在问题开始了。 查询
"query": "*2001:*",
结果
"error":{"root_cause":
[{"type":"query_shard_exception","reason":"Failed to parse query
[*2001:*]","index_uuid":"Aljv_81nQDKx3B3Fs2nVOA","index":"ip_test2"}]
查询
"query": "\"*2001:db*\"",
结果为错误,从2001:db8开始有很多数据
"took":1,"timed_out":false,"_shards":
{"total":5,"successful":5,"skipped":0,"failed":0},"hits":
{"total":0,"max_score":null,"hits":[]}}
查询是
"query": "\"*2001:db8*\"",
结果是正确,太神奇了。...WHYYYY
{"took":2,"timed_out":false,"_shards":
{"total":5,"successful":5,"skipped":0,"failed":0},"hits":
{"total":100,"max_score":1.8449252,
字段类型不是ip,我不明白不同的结果。
有人可以向我解释
我的最后一个解决方案是
{"from":0,"size":100,"sort":[{"start_time":
{"order":"desc","unmapped_type":"boolean"}}],
"query":{"bool":{"must":[{"range":{"start_time":
{"gte":1546678703407,"lte":1549270703407,"format":"epoch_millis"}}},
{"bool":{"should":[{"multi_match":
{"query":"2001:db","fields":["ip_dst_saddr"],"type":"phrase_prefix"}},
{"query_string":{"query":"*2001\\:db*","fields":
["ip_dst_saddr"],"analyze_wildcard":true}}]}}]}}}
答案 0 :(得分:0)
用于查询
"query": "*2001:*"
您需要转义冒号(有关更多示例,请参见here),因此请尝试使用"query": "*2001\\:*"
然后对于其他查询,您不能在短语匹配内使用通配符(有关更多详细信息,请参见here)
如果您在查询中使用validate API:
POST <your_index>/_validate/query?explain=true
{
"query": {
"query_string" : {
"fields" : ["ip_addr"],
"query": "\"*2001:db*\"",
"analyze_wildcard": true
}
}
}
您将看到对该查询的解释为
"explanations": [
{
"index": "<your_index>",
"valid": true,
"explanation": """ip_addr:"2001 db""""
}
]
和"query": "\"*2001:db8*\""
至
"explanations": [
{
"index": "test_so",
"valid": true,
"explanation": """ip_addr:"2001 db8""""
}
]
因此,查询"query": "\"*2001:db*\""
仅匹配具有相同顺序的令牌“ 2001”和“ db”的文档(完整的令牌“ db”,而不是“ db8”或其他任何东西)
并且查询"query": "\"*2001:db8*\""
将以相同顺序匹配任何包含“ 2001”和“ db8”的文档。
您应该真正使用elasticsearch的IP数据类型。