我有一个大逗号分隔的日志文件。我想解析一些信息。
2010-02-10 10:00:00.000 171.606 bad_gateway
2010-02-10 10:00:00.234 400.680 bad_gateway
2010-02-10 10:00:00.410 212.308 login_from
2010-02-10 10:00:00.601 222.251 bad_gateway
问题在于,我需要按时间范围写出事件(例如:在10:00:00:000-11:00:00之间),并计算持续时间长度在一分钟内的数量。我正在尝试这样的输出文件:
bad_gateway
10:00
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
264.845 3
login_from
10:00
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
212.308 1
bad_gateway
10:01
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
xxx.xxx x
试图用awk弄清楚,但是被困住了。谢谢你的帮助!
这是我现在所得到的:
BEGIN {
low["<1ms"]=0;high["<1ms"]=1
low["1-10ms"]=1;high["1-10ms"]=10
low["10-100ms"]=10;high["10-100ms"]=100
low["100-500ms"]=100;high["100-500ms"]=500
low[">500ms"]=500;high[">500ms"]=1000000000
}
{
for (i in high) {
if ((duration > low[i]) && (duration <= high[i]) ) {
total+=duration
bin[i]++
count++
}
}
}
在END部分,我做printf。
答案 0 :(得分:3)
您的输入数据很短,无法彻底测试。在这里,您有一个awk
脚本,它可以或多或少地执行您要查找的内容。它已经完全注释,所以从这里你可以修改它以满足你的需求:
script.awk
的内容:
BEGIN {
header = sprintf( "\t%-10s\t%10s\t%10s\t%10s\t%10s\t%10s", "AVG", "<1ms", "1-10ms", "10-100ms", "100-500ms", "500+ms" )
## Output slices if time.
slices = "1 10 100 500"
split( slices, slices_a )
## Hardcoded start and end times.
start_time = mktime( "2010 02 10 10 00 00" )
end_time = mktime( "2010 02 10 11 00 00" )
}
{
## Extract hour, minute and second from time.
fields = split( $2, time, /[:.]/ )
if ( fields != 4 ) { print "WARNING: Skipped line " FNR " because had bad formatted time." }
## Save previous time to be able to compare if a second has passed. First line is
## a special case because there is not yet a saved value.
if ( FNR == 1 ) {
prev_time = mktime( "2010 02 10 " time[1] " " time[2] " " time[3] )
}
else {
curr_time = mktime( "2010 02 10 " time[1] " " time[2] " " time[3] )
## When a second has passed, print all extracted data.
if ( curr_time - prev_time > 59 ) {
print_minute_info(duration, prev_time, header, slices_a)
## Initialize data.
prev_time = curr_time
delete duration
}
}
## For each name (last field) concatenate durations.
duration[ $NF ] = duration[ $NF] "|" $3
}
END {
print_minute_info(duration, prev_time, header, slices_a)
}
## Traverse hash with following format (example):
## duration[ bad_gateway ] = "|34.567|234.918|56.213|"
##
## So, for each key split with pipe, sum its values and try to
## print a formatted output.
function print_minute_info(duration,prev_time,header,slices_a, name,sum,times,times_a,num_times,i,times_avg,printed) {
for ( name in duration ) {
sum = 0
times = substr( duration[name], 2 )
split( times, times_a, /\|/ )
num_times = length( times_a )
for ( i = 1; i <= num_times; i++ ) {
sum = sum + times_a[i]
}
times_avg = sum / num_times
printf "%s\n", name
printf "%s\n", strftime( "%H:%M", prev_time )
printf "%s\n", header
printf "\t%-10s", times_avg
## This part tries to print the number of ocurrences just
## below its header. It can be improved.
for ( i = 1; i <= length( slices_a ); i++ ) {
if ( times_avg < slices_a[i] ) {
printf "%10d\n", num_times
printed = 1
break
}
else {
printf "\t%10s", ""
}
}
if ( ! printed ) {
printf "%10d\n", num_times
}
printf "\n"
}
}
假设关注infile
:
2010-02-10 10:00:00.000 171.606 bad_gateway
2010-02-10 10:00:00.234 400.680 bad_gateway
2010-02-10 10:00:00.410 212.308 login_from
2010-02-10 10:00:00.601 223.251 bad_gateway
2010-02-10 10:01:00.401 224.251 bad_gateway
2010-02-10 10:01:00.701 225.251 bad_gateway
2010-02-10 10:01:04.401 226.251 login_to
2010-02-10 10:02:04.401 1.251 login_to
像以下一样运行:
awk -f script.awk infile
产量:
login_from
10:00
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
212.308 1
bad_gateway
10:00
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
265.179 3
bad_gateway
10:01
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
224.751 2
login_to
10:01
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
226.251 1
login_to
10:02
AVG <1ms 1-10ms 10-100ms 100-500ms 500+ms
1.251 1
答案 1 :(得分:1)
我对awk不够熟练,但在perl中很容易做到这一点......将数据存入存储桶通常需要使用散列或数组数据结构。只需正则表达式提取字段,然后使用哈希创建作为计数器的存储桶,并为每次出现增加计数器,如下所示:
while( <> ) { # iterate over input file
// extract fields here... e.g.
// $errType =~ /(\S+)$/;
// etc.
$bins->{$errType}{$time}{$duration}++;
}
# now iterate over hashes and print out your report
foreach $key1 ( keys %$bins ) {
foreach ...
}
不是你想要的答案,但也许它会让你走上正轨。