我的弹性搜索文档中有一个字段path
,其中包含这样的条目
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_011007/stderr
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_008874/stderr
#*Note -- I want to select all the documents having below line in the **path** field
/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr
我希望在给定某些事情的情况下对此path
字段进行类似的查询(基本上所有3个都是AND条件): -
1451299305289_0120
009257
stderr
鉴于上述标准,应选择具有路径字段作为第3行的文档
这是我到目前为止所尝试的
http://localhost:9200/logstash-*/_search?q=application_1451299305289_0120 AND path:stderr&size=50
此查询符合第3条标准,部分符合第1条标准,即如果我搜索1451299305289_0120
而非application_1451299305289_0120
,则结果为0。 (我真正需要的就是搜索1451299305289_0120
)
当我尝试这个时
http://10.30.145.160:9200/logstash-*/_search?q=path:*_1451299305289_0120*008779 AND path:stderr&size=50
我得到了结果,但在开始时使用*
是一项代价高昂的操作。是他们有效实现这一目标的另一种方式(例如使用nGram
并使用fuzzy-search
的{{1}}
答案 0 :(得分:1)
这可以通过使用Pattern Replace Char Filter来实现。您只需使用regex
提取重要信息。这是我的设置
POST log_index
{
"settings": {
"analysis": {
"analyzer": {
"app_analyzer": {
"char_filter": [
"app_extractor"
],
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
},
"path_analyzer": {
"char_filter": [
"path_extractor"
],
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
},
"task_analyzer": {
"char_filter": [
"task_extractor"
],
"tokenizer": "keyword",
"filter": [
"lowercase",
"asciifolding"
]
}
},
"char_filter": {
"app_extractor": {
"type": "pattern_replace",
"pattern": ".*application_(.*)/container.*",
"replacement": "$1"
},
"path_extractor": {
"type": "pattern_replace",
"pattern": ".*/(.*)",
"replacement": "$1"
},
"task_extractor": {
"type": "pattern_replace",
"pattern": ".*container.{27}(.*)/.*",
"replacement": "$1"
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"name": {
"type": "string",
"analyzer": "keyword",
"fields": {
"application_number": {
"type": "string",
"analyzer": "app_analyzer"
},
"path": {
"type": "string",
"analyzer": "path_analyzer"
},
"task": {
"type": "string",
"analyzer": "task_analyzer"
}
}
}
}
}
}
}
我正在使用正则表达式提取application number
,task number
和path
。如果您有其他日志模式,您可能希望优化task regex
,然后我们可以使用Filters进行搜索。使用过滤器的一大优势是它们缓存并使后续通话更快。
我像这样索引样本日志
PUT log_index/your_type/1
{
"name" : "/logs/hadoop-yarn/container/application_1451299305289_0120/container_e18_1451299305289_0120_01_009257/stderr"
}
此查询将为您提供所需的结果
GET log_index/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"name.application_number": "1451299305289_0120"
}
},
{
"term": {
"name.task": "009257"
}
},
{
"term": {
"name.path": "stderr"
}
}
]
}
}
}
}
}
旁注filtered query
已在ES 2.x
中弃用,只需直接使用过滤器。另外path hierarchy可能对其他一些用途有用
希望这会有所帮助:)