列添加

时间:2015-03-24 14:51:59

标签: ruby bash awk

我尝试在列中进行求和(“,”之间的值),具体取决于日期和时间。

为例:

RG Data,2015/02/27,18:02:07,"0","52",50.0,5.3,44.7,5.6,100.0,0.23,0.03,0.20,6.3,4.5
RG Data,2015/02/27,18:02:07,"1","52",36.9,22.3,14.6,39.9,100.0,0.59,0.16,0.43,7.5,29.9
RG Data,2015/02/27,18:03:06,"0","52",21.2,0.7,20.5,50.0,100.0,0.08,0.00,0.08,0.0,4.2
RG Data,2015/02/27,18:03:06,"1","52",245.6,233.4,12.2,73.7,100.0,2.08,1.83,0.25,8.0,21.4
... more lines after...

输出:

RG Data,2015/02/27,18:02:07,86.9,27.6,59.3,....
RG Data,2015/02/27,18:03:06,266.8,234.1,....

其中: 86.9 来自:“50.0”(第1行)+36.9(第2行)。等等。每列。

awk代码:

for TIME in $(awk -F ',|/' '{print $4","$5}' FILE | sort -u) ;do  echo -n "$TIME; awk -F ',' "/$TIME/ {SUM += \$6} END { print SUM}" FILE ; done

非常感谢您的帮助

1 个答案:

答案 0 :(得分:0)

这个awk单行程产生的东西接近所需的输出:

$ awk -F, '{k=$1FS$2FS$3;seen[k];for(i=6;i<=NF;++i)sum[k,i]+=$i}END{for(i in seen){printf "%s,",i;for(j=6;j<=NF;++j)printf "%s%s",sum[i ,j],(j<NF?FS:RS)}}' file
RG Data,2015/02/27,18:03:06,266.8,234.1,32.7,123.7,200,2.16,1.83,0.33,8,25.6
RG Data,2015/02/27,18:02:07,86.9,27.6,59.3,45.5,200,0.82,0.19,0.63,13.8,34.4

变量k是键,它由每行的第一,第二和第三列组成,在字段分隔符FS(本例中为逗号)上连接。数组seen会跟踪遇到的每个键k

循环遍历从第六个到最后一个的每个字段,添加到sum数组的元素,其键由前两个字段组成(如seen中所示)和当前字段编号。

处理完文件后,循环遍历seen数组并打印出sum数组的所有相应元素。