我需要一些关于尝试让bool匹配工作的专家指导。我希望查询仅返回成功的搜索结果,如果两者'消息'匹配'',和'路径'匹配'/ var / log /的密码失败安全”。
这是我的疑问:
curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message" : "Failed password for" } },
{ "match_phrase" : { "path" : "/var/log/secure" } }
]
}
}
} '
以下是搜索输出的开头:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 13.308596,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 13.308596,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
}, ...
问题是如果我将'/ var / log / secure'更改为'var'说,并运行查询,我仍然得到一个结果,只是得分较低。我理解bool ...必须构建意味着这里的匹配术语都需要成功。如果'path'与'/ var / log / secure'不完全匹配,那么我所追求的是 no 结果......
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 46,
"max_score" : 10.354593,
"hits" : [ {
"_index" : "logstash-2015.05.07",
"_type" : "syslog",
"_id" : "AU0wzLEqqCKq_IPSp_8k",
"_score" : 10.354593,
"_source":{"message":"May 7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
},...
我检查了这些字段的映射,以检查它们是否未被分析:
curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'
我认为这些字段未进行分析,因此我认为搜索也不会被分析(基于我最近从elasticsearch上阅读的一些培训文档)。以下是此索引的输出_mapping的片段。
....
"message" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},
....
我哪里出错了,或者我在这里误解了什么?
答案 0 :(得分:0)
如OP中所述,您需要使用字段的" not_analyzed" 视图,但根据OP映射,字段的未分析版本为 message.raw,path.raw 例如:
{
"filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
"query" : {
"bool" : {
"must" : [
{ "match_phrase" : { "message.raw" : "Failed password for" } },
{ "match_phrase" : { "path.raw" : "/var/log/secure" } }
]
}
}
}
。旁边的链接可让您更深入了解multi-fields
。进一步扩展
OP中路径的映射如下:
"path" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
}
这指定路径字段使用默认分析器并且不分析field.raw。
如果你想将路径字段设置为不分析而不是raw,那么就会出现这些问题:
"path" : {
"type" : "string",
"index" : "not_analyzed",
"norms" : {
"enabled" : false
},
"fields" : {
"raw" : {
"type" : "string",
"index" : <whatever analyzer you want>,
"ignore_above" : 256
}
}
}