如何按列值分组到行和列标题中,然后动态地对列进行求和

时间:2016-12-26 13:04:32

标签: java linux shell unix awk

以下是我的输入和输出.txt文件。

我希望按StatusDateMethod对数据进行分组。 然后根据StatusDateMethod汇总值。

INPUT.TXT

No,Date,MethodStatus,Key,StatusDate,Hit,CallType,Method,LastMethodType
112,12/15/16,Suceess,Geo,12/15/16,1,Static,GET,12/15/16
113,12/18/16,Suceess,Geo,12/18/16,1,Static,GET,12/18/16
114,12/19/16,AUTHORIZED,Geo,12/19/16,1,Static,GET,12/19/16
115,12/19/16,AUTHORIZED,Geo,12/19/16,1,Static,GET,12/19/16
116,12/19/16,Suceess,Geo,12/19/16,1,Static,PUT,12/19/16
117,12/19/16,Suceess,Geo,12/19/16,1,Static,PUT,12/19/16
118,12/19/16,Waiting,Geo,12/19/16,1,Static,GET,12/19/16
119,12/19/16,AUTHORIZED,Geo,12/19/16,1,Static,GET,12/19/16
120,12/17/16,Suceess,Geo,12/17/16,1,Static,GET,12/17/16
121,12/17/16,Suceess,Geo,12/17/16,1,Static,GET,12/17/16
130,12/16/16,Suceess,Geo,12/16/16,1,Static,GET,12/16/16

Out.txt

StatusDate,12/15/16,12/16/16,12/17/16,12/17/16,12/18/16,12/19/16,12/19/16,12/19/16,12/19/16,12/19/16,12/19/16,Grand Total
GET,1,1,1,1,1,1,1,1,1,,,9
PUT,,,,,,,,,,1,1,2
Grand Total,1,1,1,1,1,1,1,1,1,1,1,11

我使用awk并按awk -F, '{if($8=="GET") print }'分割数据,然后计算总和值。 由于文件很大,所以会有延迟。

是否有可能一步到位?那么文件操作会减少吗?

1 个答案:

答案 0 :(得分:0)

您可以使用这样的GNU awk脚本:

<强> script.awk

BEGIN { PROCINFO["sorted_in"] = "@ind_str_asc" }

function remember( theDate, mem) {
    mem[   theDate] +=1
    # in Totals the column sum is stored for each possible date (i.e the columns)
    Totals[theDate] += 1
}

# with header 0 or 1 the first line in output is differentiated
# OFS is used, so it is possible to use a commandline option like 
# -v OFS='\t' or  -v OFS=','
function printMem( mem, name, header ) {
    printf("%s%s",name,OFS)
    sum=0
    for( k in Totals ) { 
        if( header) 
            printf("%s%s", k, OFS )
        else { 
            printf("%s%s", mem[k], OFS )
            sum += mem[k]
        }
    }
    if(!header) 
        printf("%s", sum )
    else 
        printf("Grand Total")
    print ""
}

# different methods are stored in different arrays
$8 == "GET" { remember( $2, get ) }
$8 == "PUT" { remember( $2, put ) }

END { # print the stored values
      # the first line header
      printMem( Totals , "StatusDate", 1)
      printMem( get    , "GET", 0)
      printMem( put    , "PUT", 0)
      # the summary line
      printMem( Totals , "Grand Total", 0)
    }

运行如下脚本:awk -F, -v OFS=',' script.awk Input.txt