如何使用awk解析Apache访问日志文件以按以下格式显示信息?
Date Time Count IP Address
2016-05-26 00:00 200 192.168.1.x
2016-05-26 00:00 152 172.17.100.x
2016-05-26 00:01 43 192.168.1.x
让我说清楚。我不想要显示每小时的总请求数。我不想要显示每分钟的总请求数。我知道如何编写基本的awk脚本来执行这两个任务。
我想要查看每个唯一 IP地址每分钟发送的请求数。用awk做我并不够精明。
Apache日志格式
LogFormat "%h %l %u %{%F %T %z}t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""
示例
我将日志文件的末尾添加了。这是它包含的一小部分样本。 (今天我们有超过100K的条目。在这里分享它们是不可行的。如果需要更多行,请询问。)
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1077921.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1060432.html HTTP/1.0" 403 398 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.254.166 - - 2016-05-26 14:38:51 -0400 "GET /p819757.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1084269.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
107.23.252.229 - - 2016-05-26 14:38:51 -0400 "GET /p305987.html HTTP/1.0" 403 399 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
示例1:
grep '2016-05-26' access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail -40 | awk '{print $2,$2,$1}' | logresolve | awk '{printf "%6d %s (%s)\n",$3,$1,$2}'
生成以下输出
307 135-23-174-138.cpe.pppoe.ca (135.23.174.138)
313 5265DCE5.cm-8.dynamic.ziggo.nl (82.101.220.229)
378 92-108-204-76.dynamic.upc.nl (92.108.204.76)
405 0191301456.0.fullrate.ninja (90.185.180.167)
632 ec2-52-58-151-132.eu-central-1.compute.amazonaws.com (52.58.151.132)
798 187.228.212.148 (187.228.212.148)
877 207.246.75.253 (207.246.75.253)
966 ec2-54-213-177-120.us-west-2.compute.amazonaws.com (54.213.177.120)
1116 ec2-54-186-148-0.us-west-2.compute.amazonaws.com (54.186.148.0)
1224 ppp121-44-247-209.bras2.syd2.internode.on.net (121.44.247.209)
1369 ec2-54-187-239-46.us-west-2.compute.amazonaws.com (54.187.239.46)
1584 45.55.189.64 (45.55.189.64)
2658 50-77-47-70-static.hfc.comcastbusiness.net (50.77.47.70)
示例2:
grep "2016-05-26" access.log | awk '{ print $4, $5, $1}' | cut -f2 | awk -F: '{ print $1":"$2 }' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0 }'
这给出了以下输出:
560 2016-05-26 00:00
534 2016-05-26 00:01
538 2016-05-26 00:02
554 2016-05-26 00:03
566 2016-05-26 00:04
534 2016-05-26 00:05
559 2016-05-26 00:06
531 2016-05-26 00:07
540 2016-05-26 00:08
435 2016-05-26 00:09
312 2016-05-26 00:10
非常感谢所有帮助。
答案 0 :(得分:0)
这是一种方式:
首先,将其转换为:
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1077921.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
到此:
54.213.236.39 2016-05-26 14 # <- 14th hour
然后sort | uniq -c
那个。
grep '2016-05-26' access.log |
tr ':' ' ' |
awk '{print $1,$4,$5}' |
sort |
uniq -c |
sort -n