awk计算文件中每分钟的平均值

时间:2014-06-11 18:29:17

标签: regex bash awk sed mean

假设我有一个这样的文件:

  

13.03.2013 12:13:01 | STRING1 | NUMBER1 | 1 | NUMBER3
  13.03.2013 12:13:08 | STRING1 | NUMBER1 | 12 | NUMBER3
  13.03.2013 12:13:09 | STRING3 | NUMBER1 | 13 | NUMBER3
  13.03.2013 12:13:12 | STRING2 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:13:15 | STRING2 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:13:18 | STRING1 | NUMBER1 | 13 | NUMBER3
  13.03.2013 12:13:20 | STRING2 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:13:25 | STRING3 | NUMBER1 | 51 | NUMBER3
  13.03.2013 12:13:38 | STRING2 | NUMBER1 | 71 | NUMBER3
  13.03.2013 12:13:40 | STRING1 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:13:42 | STRING1 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:13:55 | STRING3 | NUMBER1 | 71 | NUMBER3
  13.03.2013 12:14:02 | STRING1 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:14:07 | STRING1 | NUMBER1 | 13 | NUMBER3
  13.03.2013 12:14:08 | STRING3 | NUMBER1 | 13 | NUMBER3
  13.03.2013 12:14:15 | STRING2 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:14:16 | STRING2 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:14:16 | STRING1 | NUMBER1 | 1 | NUMBER3
  13.03.2013 12:14:20 | STRING2 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:14:25 | STRING3 | NUMBER1 | 51 | NUMBER3
  13.03.2013 12:14:37 | STRING2 | NUMBER1 | 71 | NUMBER3
  13.03.2013 12:14:42 | STRING1 | NUMBER1 | 1 | NUMBER3
  13.03.2013 12:14:45 | STRING1 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:14:58 | STRING3 | NUMBER1 | 51 | NUMBER3
  13.03.2013 12:15:06 | STRING2 | NUMBER1 | 11 | NUMBER3
  13.03.2013 12:15:13 | STRING1 | NUMBER1 | 43 | NUMBER3
  13.03.2013 12:15:22 | STRING2 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:15:26 | STRING3 | NUMBER1 | 51 | NUMBER3
  13.03.2013 12:15:35 | STRING2 | NUMBER1 | 71 | NUMBER3
  13.03.2013 12:15:40 | STRING1 | NUMBER1 | 1 | NUMBER3
  13.03.2013 12:15:42 | STRING1 | NUMBER1 | 21 | NUMBER3
  13.03.2013 12:15:53 | STRING3 | NUMBER1 | 71 | NUMBER3

我想找到仅为变量|的每分钟的第4列(第三X之后)的平均值。例如,如果$X="STRING1"结果应为:

  

13.03.2013 12:13 | STRING1 | 11.6
  13.03.2013 12:14 | STRING1 | 7.4
  13.03.2013 12:15 | STRING1 | 21.666

因此,我们正在查看变量$X的每一分钟行,并计算这些行的平均值。怎么处理呢?

2 个答案:

答案 0 :(得分:2)

您可以使用以下awk程序:

example.awk

$0 ~ SEARCH {
  split($1,time,":")
  min=time[2]
  total[min]+=$4
  count[min]++
  ts[min]=time[1]":"time[2]
}

END{
  for(m in total){
    printf "%s|%s|%s\n", ts[m],SEARCH,total[m]/count[m]
  }
}

执行它:

awk -F'|' -v SEARCH=STRING1 -f example.awk your.log

输出:

13.03.2013 12:13|STRING1|11.6
13.03.2013 12:14|STRING1|7.4
13.03.2013 12:15|STRING1|21.6667

答案 1 :(得分:2)

awk -v X="STRING1" '
    BEGIN { FS = OFS = "|" }
    $2 != X {next} 
    {min = substr($1,1,16)} 
    min != prev {
        if (NR>1) print prev, X, total/n
        total = n = 0
        prev = min
    } 
    {n++; total += $4} 
    END {print prev, X, total/n}
' file