在目录中递归搜索带有换行符的文本?

时间:2015-03-03 19:14:23

标签: linux grep

我有许多大型日志文件,看起来像这样:

DATETIME ["2015-03-03 21:52"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST1"]

DATETIME ["2015-03-03 21:53"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","CCC"]
POST ["POST_JSON","DDD","TEST2"]

DATETIME ["2015-03-03 21:54"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST3"]

DATETIME ["2015-03-03 21:55"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","EEE","TEST4"]

我想搜索大约2个关键字(它们之间是换行符)。 GET-Line中的一个特定单词和POST-Line中的一个特定单词。

我需要类似的东西:

grep "GET(.*)AAA(.*)POST(.*)BBB"

搜索:AAA(在GET-Line中)&& BBB(在POST-Line中)

预期的结果

POST ["POST_JSON","BBB","TEST1"]

POST ["POST_JSON","BBB","TEST3"]

使用哪种简单方法可行?

3 个答案:

答案 0 :(得分:1)

使用GNU awk为第3个arg匹配():

$ find . -type f |
xargs gawk -v RS= 'match($0,/\nGET.*AAA.*\n(POST.*BBB.*)/,a){print a[1]}'
POST ["POST_JSON","BBB","TEST1"]
POST ["POST_JSON","BBB","TEST3"]

如果您确实希望输出行之间有空行,请添加-v ORS='\n\n'

答案 1 :(得分:0)

grep是您要搜索的命令

grep -rHn "GET.*KEYWORD_A" -A1 /path/to/files | grep "POST.*KEYWORD_B" 

我首先要grep包含KEYWORD_A的行,并在匹配后追加一行,因为POST是在日志文件中的GET之后。然后搜索KEYWORD_B

-r greps recursively in a directory
-H prints the file name
-n prints the line number

答案 2 :(得分:0)

我用正则表达式的grep -P解决了这个问题,因为我从PHP中知道它,特别是使用-A来获得下一个n行。然后我用" |"过滤了结果。和grep -P再次