当节的内容与模式匹配时,打印文件的部分

时间:2018-02-09 13:22:10

标签: awk sed text-parsing

我一直在使用Go Replay来捕获HTTP流量。 现在我留下了一个文本文件,其中每个请求都由''

分隔开来
1 10ef8cc77b962b383557265f5eb1922e5affa88e 1518086364760738000
HEAD /xyz/
Host: d.e.f
User-Agent: ...
...
Connection: Keep-Alive



1 3534a2e1d670c596a673a706c3031a6bec9d6b06 1518086364994132000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive



1 06891fdbebd48cb23ffe6ed5964c3fadcceb9199 1518086366027862000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive

我想仅提取(打印)该文件中与给定标题Host: a.b.c匹配的请求:

1 3534a2e1d670c596a673a706c3031a6bec9d6b06 1518086364994132000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive



1 06891fdbebd48cb23ffe6ed5964c3fadcceb9199 1518086366027862000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive

注意:输入文件可能还包含POST请求的二进制数据(如Content-Type: image/png):

POST /...
Content-Length: 26892

-----------------------------19579713013480936471158807818
Content-Disposition: form-data; name="upload"; filename="__fileCreatedFromDataURI__.png"
Content-Type: image/png

<89>PNG
^Z
^@^@^@^MIHDR^@
...

哪可能会破坏处理......

使用像awk / sed这样的工具可以一次性实现吗?或者它可能需要编写一个普通脚本(使用Python代替)?我想我可以将输入拆分为多个文件,但这会导致文件过多。

1 个答案:

答案 0 :(得分:2)

GNU awk 方法:

awk 'BEGIN{ RS=ORS="" }/Host: a.b.c/; END{ ORS=""; print }' file

输出:

1 3534a2e1d670c596a673a706c3031a6bec9d6b06 1518086364994132000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive



1 06891fdbebd48cb23ffe6ed5964c3fadcceb9199 1518086366027862000
HEAD /abc/
Host: a.b.c
User-Agent: ...
...
Connection: Keep-Alive