从Apache日志显示每小时的IP地址和IP地址计数

时间:2016-05-26 16:07:45

标签: bash apache logging awk gawk

如何使用awk解析Apache访问日志文件以按以下格式显示信息?

   Date     Time  Count   IP Address
2016-05-26  00:00  200    192.168.1.x
2016-05-26  00:00  152    172.17.100.x
2016-05-26  00:01   43    192.168.1.x

让我说清楚。我想要显示每小时的总请求数。我想要显示每分钟的总请求数。我知道如何编写基本的awk脚本来执行这两个任务。

想要查看每个唯一 IP地址每分钟发送的请求数。用awk做我并不够精明。

Apache日志格式

LogFormat "%h %l %u %{%F %T %z}t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\""

示例

我将日志文件的末尾添加了。这是它包含的一小部分样本。 (今天我们有超过100K的条目。在这里分享它们是不可行的。如果需要更多行,请询问。)

54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1077921.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1060432.html HTTP/1.0" 403 398 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.254.166 - - 2016-05-26 14:38:51 -0400 "GET /p819757.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1084269.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"
107.23.252.229 - - 2016-05-26 14:38:51 -0400 "GET /p305987.html HTTP/1.0" 403 399 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"

示例1:

grep '2016-05-26' access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail -40 | awk '{print $2,$2,$1}' | logresolve | awk '{printf "%6d %s (%s)\n",$3,$1,$2}'

生成以下输出

307 135-23-174-138.cpe.pppoe.ca (135.23.174.138)
313 5265DCE5.cm-8.dynamic.ziggo.nl (82.101.220.229)
378 92-108-204-76.dynamic.upc.nl (92.108.204.76)
405 0191301456.0.fullrate.ninja (90.185.180.167)
632 ec2-52-58-151-132.eu-central-1.compute.amazonaws.com (52.58.151.132)
798 187.228.212.148 (187.228.212.148)
877 207.246.75.253 (207.246.75.253)
966 ec2-54-213-177-120.us-west-2.compute.amazonaws.com (54.213.177.120)
1116 ec2-54-186-148-0.us-west-2.compute.amazonaws.com (54.186.148.0)
1224 ppp121-44-247-209.bras2.syd2.internode.on.net (121.44.247.209)
1369 ec2-54-187-239-46.us-west-2.compute.amazonaws.com (54.187.239.46)
1584 45.55.189.64 (45.55.189.64)
2658 50-77-47-70-static.hfc.comcastbusiness.net (50.77.47.70)

示例2:

grep "2016-05-26" access.log | awk '{ print $4, $5, $1}' | cut -f2 | awk -F: '{ print $1":"$2 }' | sort -nk1 -nk2 | uniq -c | awk '{ if ($1 > 10) print $0 }'

这给出了以下输出:

560 2016-05-26 00:00
534 2016-05-26 00:01
538 2016-05-26 00:02
554 2016-05-26 00:03
566 2016-05-26 00:04
534 2016-05-26 00:05
559 2016-05-26 00:06
531 2016-05-26 00:07
540 2016-05-26 00:08
435 2016-05-26 00:09
312 2016-05-26 00:10

非常感谢所有帮助。

1 个答案:

答案 0 :(得分:0)

这是一种方式:

首先,将其转换为:

54.213.236.39 - - 2016-05-26 14:38:51 -0400 "GET /p1077921.html HTTP/1.0" 403 400 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_77)"

到此:

54.213.236.39 2016-05-26 14  # <- 14th hour

然后sort | uniq -c那个。

grep '2016-05-26' access.log |
  tr ':' ' ' |
  awk '{print $1,$4,$5}' |
  sort |
  uniq -c |
  sort -n
相关问题