我有超过26000个文件的大日志,每个文件都有如下内容。我需要排除所有带有JSON 404的行。在下面的例子中,我需要得到最后一行,因为这是具有404而不是JSON的内容。编写过滤正则表达式的任何帮助? Linux大师的帮助表示赞赏..
- 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.22获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.jpg 404
2015-07-28 11:34:57 MAD50 658 124.13.170.152获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD: %252032%2520; SD) - - 错误tdlmnsfrOCxOelbe82y3kIp_QfbBF7S3dDCn4rHR65JOMkOtZu4dzA == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:53 SIN3 659 14.192.214.93获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误5r0xsHnxLY5TePeJ6ZfKvuHrhQnbd2lbWtDQosEXLj4Z7TZ5N68ZhA == pdl.astro.com.my http 151 0.002 - - - 错误 2015-07-28 11:34:53 SIN3 659 14.192.213.198获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误koGGTK2mc2dDS3XvABS0zAeqheH52toNmJgIqAh5A0TYKIZL6qsgRw == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 14.192.208.27获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误bvLIe540oNMCeZ0QpOmX1OKoClgNgvSWppGuOmgVS85WnAXKJ1ryDg == pdl.astro.com.my http 151 0.002 - - - 错误 2015-07-28 11:34:54 SIN3 659 210.19.26.33获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误6Wl5xeCZArNN3WGaIGOA6XjUqZHEiENbWOmChiMZPayefDuLtC8WrA == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 121.121.62.92获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误WLn7heBO3PvvVW1vt365EVXqoD440Byy6Qh6RYYazSyPBZUxwsS0Jg == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:54 SIN3 659 14.192.213.9获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误hTbk9HE5nyFSla1DmeC1D1jhuMtoUY6E7QQvyf0v1YyJ1GBp-I40bw == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.250获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; HD) - - 错误avWgysZyGeGXdVxZHLfP5uLJ4ie5Hx8pa6ZJC5GHXfvOkyEXXp8o0g == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.211.78获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误wBepjCn58o9AiTifvtrCprkjdAdg - zsLTsjDpUBkxnEU5tahmJxxQ == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 121.121.101.4获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误YZ07B5vu7L4I3aoTcBXF5rcH8Dwrv5a77xRqqelkQqvQhYLDnkrKWg == pdl.astro.com.my http 151 0.001 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.208.156获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404 - NDS%2520VM%2520Engine / 002%2520Apr%252004%25202014%2520(OSD:%252032% 2520; SD) - - 错误pbmzjYvLFIlLeth6mN2Yox9DH4vap1hcFHuJgNosd0XHVSxGdRcrWw == pdl.astro.com.my http 151 0.004 - - - 错误 2015-07-28 11:34:55 SIN3 659 14.192.213.22获取d2v2sjgehuhalt.cloudfront.net /thumbnail/mediaInfo_211.json 404
答案 0 :(得分:0)
请阅读how to ask,你的问题是off topic as-it而你没有provide code;它不是关于编码的,而是serverfault可能会更好。
如果你想解析大的HTTP日志,你应该使用visitors,如果你想要一个JSON输出,那么这个社区是关于编码的,你可以扩展它来做。
否则,对于您的原始问题,这是awk
的一种方式:
awk '$NF == 404 && $(NF -1) ~ /\.json$/ { next; } {print}' /path/to/yourfile.log
$NF == 404 # the last field is 404
$(NF -1) # the field before the last
~ /\.json$/ # ends with .json
{ next; } # skip this line
{ print } # print anything else