求和时间序列数据

时间:2016-07-21 00:04:37

标签: awk gnuplot

我有基于每分钟数据的时间,并希望将其汇总为每小时(或其他时段,如周,月)。

数据看起来像这样

timeStamp,kwH,watts
"2016-07-16 16:18:51",0.014,710
"2016-07-16 16:20:01",0.013,669
"2016-07-16 16:22:40",0.020,720
...
"2016-07-16 21:06:01",0.006,360
"2016-07-16 21:07:00",0.006,366
"2016-07-16 21:08:01",0.007,413
"2016-07-16 21:09:01",0.006,360

我想要按第1列的小时分组第二列(kwH)。

http://pastebin.com/raw/BbjLebVx

提供了更大的数据集

我如何总结这个?我猜这可能涉及awk。

其次,鉴于生成图表的数据,Web服务和bash脚本都驻留在我控制的服务器上,我是否更有效地在mySQL中对这些数据求和,而不是试图让gnuplot处理兆字节的原始数据数据?

1 个答案:

答案 0 :(得分:0)

$ cat > test.awk
{
  gsub(/^.* |:.*/,"",$1); # using regex remove all but the hour from the timestamp for "grouping by the hour" 
  arr[$1]+=$2             # sum together the "kwH"
} 
END {                     # after summing we print
  for (i in arr)          # for each element (hour) in the array
    print i,arr[i]}       # print the element and the sum of "kwH"
$ awk -f test.awk test.in
timeStamp 0
21 0.025
... 0
16 0.047