当条目包含多个换行符时,正则表达式从日志文件中捕获单个条目

时间:2014-09-04 14:18:56

标签: regex pcre

我正在尝试解析Elasticsearch日志文件,但我已经挂断了。在下一行输入之前,我似乎无法让它破裂。这是一个日志片段(我明显地修剪了它):

[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][1], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28] lastShard [true]
org.elasticsearch.search.SearchParseException: [my_index_name123][1]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@7d4bfe4>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:203)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][0], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28]
org.elasticsearch.search.SearchParseException: [my_index_name123][0]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@6392f9f7>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.index.query.QueryStringQueryParser.parse(QueryStringQueryParser.java:203)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] All shards failed for phase: [query]
[2014-09-03 07:47:40,088][DEBUG][action.search.type       ] [Server1] [my_index_name123][4], node[JLNzpIU9QRikVkfQyIFjCA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@769a0c28] lastShard [true]
org.elasticsearch.search.SearchParseException: [my_index_name123][4]: from[0],size[10],sort[<custom:"time_generated": org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@6cb38cdb>!]: Parse Failure [Failed to parse source [{"from":0,"size":10,"sort":[{"time_generated":{"order":"desc"}}],"query":{"filtered":{"query":{"bool":{"must":[{"query_string":{"fields":["event_message"],"default_operator":"AND","query":null}}]}}}},"aggs":{"events_over_time":{"filter":{"range":{"time_generated":{"from":"2014-08-05T12:47:41.000Z","to":"2014-09-03T12:47:41.000Z"}}},"aggs":{"timeline":{"date_histogram":{"field":"time_generated","interval":"day"}}}}}}]]
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [my_index_name123] query_string must be provided with a [query]
    at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
    at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
    ... 11 more

这是我正在测试的正则表达式:

(?<=(\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]))([\d\s\p{L}\p{P}\p{S}]+)(?=\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\])

模式匹配每个日志条目,但最后一个。我认为这是因为最后的前瞻。我已经盯着这一段时间,但我似乎无法想出一个方法来让它匹配一个条目。我正在使用PCRE。希望有人比我更强大的正则表达式。

1 个答案:

答案 0 :(得分:2)

我看到需要进行两项更改。

1)使捕获组量化延迟,使其仅上升到下一个时间戳

([\d\s\p{L}\p{P}\p{S}]+?)

2)向前瞻添加一个替换以匹配字符串的可能结尾

(?=(\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]|$))

这是完整的正则表达式:

(?<=(\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]))([\d\s\p{L}\p{P}\p{S}]+?)(?=(\n\[\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\]|$))