Linux过滤大日志文件用于报告

时间:2015-07-29 09:14:42

标签: regex linux logging analysis

我有超过26000个文件的大日志,每个文件都有如下内容。我需要排除所有带有JSON 404的行。在下面的例子中,我需要得到最后一行,因为这是具有404而不是JSON的内容。编写过滤正则表达式的任何帮助? Linux大师的帮助表示赞赏..

- 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.22获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.jpg 404

版本:1.0

字段:日期时间x-edge-location sc-bytes c-ip cs-method cs(主机)cs-uri-stem sc-status cs(Referer)cs(User-Agent)cs-uri-query cs( Cookie)x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type

2015-07-28 11:34:57 MAD50 658 124.13.170.152获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD: %252032%2520; SD) - - 错误tdlmnsfrOCxOelbe82y3kIp_QfbBF7S3dDCn4rHR65JOMkOtZu4dzA == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:53 SIN3 659 14.192.214.93获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误5r0xsHnxLY5TePeJ6ZfKvuHrhQnbd2lbWtDQosEXLj4Z7TZ5N68ZhA == pdl.astro.com.my http 151 0.002 - - - 错误 2015-07-28 11:34:53 SIN3 659 14.192.213.198获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误koGGTK2mc2dDS3XvABS0zAeqheH52toNmJgIqAh5A0TYKIZL6qsgRw == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 14.192.208.27获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误bvLIe540oNMCeZ0QpOmX1OKoClgNgvSWppGuOmgVS85WnAXKJ1ryDg == pdl.astro.com.my http 151 0.002 - - - 错误 2015-07-28 11:34:54 SIN3 659 210.19.26.33获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误6Wl5xeCZArNN3WGaIGOA6XjUqZHEiENbWOmChiMZPayefDuLtC8WrA == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 121.121.62.92获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误WLn7heBO3PvvVW1vt365EVXqoD440Byy6Qh6RYYazSyPBZUxwsS0Jg == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 14.192.213.9获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误hTbk9HE5nyFSla1DmeC1D1jhuMtoUY6E7QQvyf0v1YyJ1GBp-I40bw == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.250获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; HD) - - 错误avWgysZyGeGXdVxZHLfP5uLJ4ie5Hx8pa6ZJC5GHXfvOkyEXXp8o0g == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.211.78获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误wBepjCn58o9AiTifvtrCprkjdAdg - zsLTsjDpUBkxnEU5tahmJxxQ == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 121.121.101.4获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误YZ07B5vu7L4I3aoTcBXF5rcH8Dwrv5a77xRqqelkQqvQhYLDnkrKWg == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.208.156获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.22获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404

  • 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.22获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.jpg 404

1 个答案:

答案 0 :(得分:0)

请阅读how to ask,你的问题是off topic as-it而你没有provide code;它不是关于编码的,而是serverfault可能会更好。

如果你想解析大的HTTP日志,你应该使用visitors,如果你想要一个JSON输出,那么这个社区是关于编码的,你可以扩展它来做。

否则,对于您的原始问题,这是awk的一种方式:

awk '$NF == 404 && $(NF -1) ~ /\.json$/ { next; } {print}' /path/to/yourfile.log

$NF == 404  # the last field is 404
$(NF -1)    # the field before the last
~ /\.json$/ # ends with .json
{ next; }   # skip this line
{ print }   # print anything else