我的日志文件格式如下
[30/Jan/2015:10:10:30 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 425
[30/Jan/2015:10:11:00 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 261
[30/Jan/2015:10:11:29 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 232
[30/Jan/2015:10:12:00 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 315
[30/Jan/2015:10:12:29 +0000] 12.30.30.204 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 221
[30/Jan/2015:10:12:57 +0000] 12.30.30.182 xff=- reqId=[-] status_check len=- GET /api/getstatus HTTP/1.1 mi=- ec=- 200 218
此日志文件中的每一行在第一个字段中都有时间戳,在最后一个字段中有响应时间。在awk
中是否有办法读取所有特定时间间隔内的平均响应时间?例如,根据日志文件中的时间戳,每五分钟计算一次avg响应时间。
除了awk
之外,还有其他最好的替代方法吗?请建议。
更新
我尝试了以下方法,这是静态方式,并且只给出一个时间间隔的平均值。
$ grep "30/Jan/2015:10:1[0-4]" mylog.log | awk '{resp+=$NF;cnt++;}END{print "Avg:"int(resp/cnt)}'
但是我需要为整个文件做5分钟。即使我循环命令,我如何动态地将日期传递给命令?因为日志文件每次都在变化,并且日期也在变化。
答案 0 :(得分:3)
嗯。 GNU日期不喜欢你的日期格式,所以我想我们必须自己解析它。我正在思考这些问题(这需要mktime
gawk):
# returns the seconds since epoch that stamp represents. This will be
# the first field in the line, with [] and everything. It's rather
# rudimentary:
function parse_timestamp(stamp) {
# Split stamp into tokens delimited by [, ], /, : or space
split(stamp, c, "[][/: ]")
# reassemble (using the lookup table for the months from below) in a
# format that mktime understands (then call mktime).
return mktime(c[4] " " mnums[c[3]] " " c[2] " " c[5] " " c[6] " " c[7])
}
BEGIN {
# parse_timestamp needs this lookup table.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", mnames)
for(i = 1; i <= length(mnames); ++i) {
mnums[mnames[i]] = i
}
# time is a parameter supplied by you.
start = parse_timestamp(time)
end = start + 300
if(start == -1) {
print "Warning: Could not parse timestamp \"" time "\""
}
}
{
# in each line: parse the timestamp
curtime = parse_timestamp($1)
}
# if it lies in the interval you want, sum up the last field and increase
# the counter
curtime >= start && curtime < end {
sum += $NF
++count
}
END {
# and in the end, print the average.
print "Avg: " (count == 0 ? "undef" : sum / count)
}
将此文件放入文件中,例如average.awk
,然后调用
awk -v time='[30/Jan/2015:10:11:20 +0000]' -f average.awk foo.log
如果您确定日志文件将按升序排序(可能就是这种情况),您可以通过替换
来提高效率curtime >= start && curtime < end {
sum += $NF
++count
}
带
curtime >= end {
exit
}
curtime >= start {
sum += $NF
++count
}
在找到第一个条目之后,这将停止搜索拟合日志条目。
附录:由于OP澄清了他想要在排序的makefile中所有五分钟间隔的Summaries,这样做的调整脚本是
#!/usr/bin/awk -f
function parse_timestamp(stamp) {
split(stamp, c, "[][/: ]")
return mktime(c[4] " " mnums[c[3]] " " c[2] " " c[5] " " c[6] " " c[7])
}
BEGIN {
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", mnames)
for(i = 1; i <= length(mnames); ++i) {
mnums[mnames[i]] = i
}
}
{
curtime = parse_timestamp($1)
}
NR == 1 {
# pull the start time from the first line
start = curtime
end = start + 300
}
curtime > end {
# print result, reset counters when endtimes are past
print "Avg: " (count == 0 ? "undef" : sum / count)
sum = 0
count = 0
end += 300
}
{
sum += $NF
++count
}
END {
# print once more at the very end for the last, unfinished interval.
print "Avg: " (count == 0 ? "undef" : sum / count)
}