分组并总结 - AWK SCRIPT

时间:2015-04-20 23:11:43

标签: awk group-by gawk

我有数以百万计的数据:

TOTALOCTETSUNIT SERVEDACCOUNT   SERVICECLASSID  ACCUMULATEDUNITS    ACCOUNTUNITSDEDUCTED    ACCOUNTVALUEBEFORE  ACCOUNTVALUEAFTER
850             66498336         70             10240                10240                   0.083333           0.083333
259             64625247         41             10240                10240                   65.500000          65.50000
219792          76608974         35             225280               225280                  653.049798         653.049798
15261           76900654         35             20480                20480                   35.516666          35.516666

我必须通过SERVEDACCOUNT然后SERVICECLASSID进行分组,然后根据这个组的结果我必须总结TOTALOCTETSUNIT,ACCUMULATEDUNITS,ACCOUNTUNITSDEDUCTED和ACCOUNTVALUEBEFORE 如果总结仅基于一个字段但我们必须使用2个字段进行分组

,那将不会有问题

这是我使用save as test.awk

的awk脚本
BEGIN { FS = "|" } NR > 2500 {exit}            
1 < NR && NR <= 2500 { 
#sub(/ .*/,"",$4)      
key=$3
TOTOCTET[key]+=$1
ACCUNITS[key]+=$4
ACCUNITTED[key]+=$5
ACCVALBEF[key]+=$6} END {
printf "%-13s %18s %18s %18s %18s\n", 
    "SERVEDACCOUNT","TOTALOCTETSUNIT","ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE" 
for (i in TOTOCTET) { 
    printf "%-4s %16.6f %16.6f %16.6f %16.6f\n", 
        i,TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] }
}

运行我正在使用的脚本 $ awk -f test file.txt

我得到的输出是一组0,如:

enterSERVEDACCOUNT    TOTALOCTETSUNIT   ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE 
0.000000               0.000000         0.000000         0.000000         0.000000
0.000000               0.000000         0.000000         0.000000         0.000000
0.000000              279.000000         0.000000         0.000000         0.000000

以下是我正在寻找的输出:

SERVEDACCOUNT   SERVICECLASSID  TOTALOCTETSUNIT ACCUMULATEDUNITS    ACCOUNTUNITSDEDUCTED    ACCOUNTVALUEBEFORE
64625247         41               259           10240                  10240                  65,5
66498336         70               850           10240                  10240                 0,083333
76608974         35               219792        225280                225280                  653,049798
76900654         35               15261          20480                 20480                   35,516666

1 个答案:

答案 0 :(得分:0)

目前,密钥设置为$3,但如果密钥必须由SERVEDACCOUNTSERVICECLASSID确定,则密钥应基于$2$3,例如:

BEGIN { FS = "\t" } 
1 < NR && NR <= 2500 {
    key=$2 "-" $3
    TOTOCTET[key]+=$1
    ACCUNITS[key]+=$4
    ACCUNITTED[key]+=$5
    ACCVALBEF[key]+=$6} END {
    printf "%-13s %18s %18s %18s %18s %18s\n", 
    "SERVEDACCOUNT","SERVICECLASSID","TOTALOCTETSUNIT",
    "ACCUMULATEDUNITS","ACCOUNTUNITSDEDUCTED","ACCOUNTVALUEBEFORE" 
    for (i in TOTOCTET) {
        split(i,ii,/-/)
        printf "%-16s %-16s %16.0f %16.0f %16.0f %16.0f\n", 
        ii[1],ii[2],TOTOCTET[i],ACCUNITS[i],ACCUNITTED[i],ACCVALBEF[i] 
        }
    }

......输出:

SERVEDACCOUNT     SERVICECLASSID    TOTALOCTETSUNIT   ACCUMULATEDUNITS ACCOUNTUNITSDEDUCTED ACCOUNTVALUEBEFORE
76900654         35                          15261            20480            20480               36
66498336         70                            850            10240            10240                0
64625247         41                            259            10240            10240               66
76608974         35                         219792           225280           225280              653