在UNIX中创建数据透视表

时间:2016-09-28 12:42:17

标签: linux shell unix awk

以下是我的输入数据,我尝试创建数据透视表。

input.txt中

ID,CreateDate,Category,Region,PublishDate,Code,Listing,Type,ModifiedDate
FRU426131598,22-Aug-16,SELLING,COUNTRY,22-Aug-16,1,SAMPLE,GRAPE,22-Aug-16
FRU426175576,23-Aug-16,SELLING,COUNTRY,23-Aug-16,1,SAMPLE,APPLE,23-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU426972836,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,ORANGE,26-Aug-16
FRU427322180,28-Aug-16,SELLING,COUNTRY,28-Aug-16,1,SAMPLE,GRAPE,28-Aug-16
FRU427032658,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427373494,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16
FRU427373069,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16
FRU425669484,19-Aug-16,SELLING,COUNTRY,19-Aug-16,1,SAMPLE,APPLE,19-Aug-16
FRU425616815,18-Aug-16,SELLING,COUNTRY,18-Aug-16,1,SAMPLE,APPLE,18-Aug-16
FRU420018273,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16
FRU435018589,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16
FRU421375128,26-Sep-16,SELLING,COUNTRY,26-Sep-16,1,SAMPLE,APPLE,26-Sep-16
FRU434911933,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16
FRU434594125,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16

将字段归档为Row,将createDate归档为Columns。和ID字段的值的总和。

期望的输出:

Row Labels  18-Aug-16   19-Aug-16   22-Aug-16   23-Aug-16   26-Aug-16   28-Aug-16   29-Aug-16   21-Sep-16   25-Sep-16   26-Sep-16   Grand Total
APPLE   1   1       1   5                   1   9
GRAPE           1       2   1   2               6
ORANGE                  1           2   2       5
Grand Total 1   1   1   1   8   1   2   2   2   1   20

有什么办法吗?我可以使用awk获取createdDate的计数。但无法使用行和列创建数据透视表。

3 个答案:

答案 0 :(得分:2)

awk救援!

这可以帮助你入门......

$ awk -F, -v OFS='\t' 'NR>1 {k=$(NF-1); d=$2; keys[k]; dates[d]; a[k,d]++}
                        END {line="Row Labels"; 
                             for(d in dates) line = line OFS d; 
                             print line; 
                             for(k in keys) 
                               {{line=k; 
                                 for(d in dates) line=line OFS a[k,d]} 
                                print line}}' file    

Row Labels      19-Aug-16       29-Aug-16       23-Aug-16       18-Aug-16       28-Aug-16       22-Aug-16       26-Aug-16       26-Sep-16  21-Sep-16       25-Sep-16
APPLE   1               1       1                       5       1
ORANGE                                                  1               2       2
GRAPE           2                       1       1       2

您可能希望对日期进行排序(不是那么容易)并且可以添加总计(简单)。

答案 1 :(得分:0)

这是一种对日期进行排序的方法。需要GNU awk

awk -F, '
    function date2epoch(date,    arr,mon) {
        split(date, arr, /-/)
        mon = (index("JanFebMarAprMayJunJulAugSepOctNovDec", arr[2]) - 1) / 3 + 1
        return mktime("20" arr[3] " " mon " " arr[1] " 0 0 0")
    }
    NR > 1 {
        d = date2epoch($NF)
        dates[d]
        count[$(NF-1)][d]++
        total[d]++
    } 
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"

        printf "Row Label"
        for (d in dates) 
            printf "\t%s", strftime("%d-%b-%y", d)
        print ""

        for (type in count) {
            printf "%s", type
            for (d in dates) 
                printf "\t%s", count[type][d]
            print ""
        }

        printf "Total"
        for (d in dates) 
            printf "\t%s", total[d]
        print ""
    }
' file

答案 2 :(得分:0)

使用GNU awk 4. *用于真正的多维数组和sorted_in:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
NR>1 {
    split($2,t,/-/)
    date = sprintf("%02d%02d%02d",t[3],(match("JanFebMarAprMayJunJulAugSepOctNovDec",t[2])+2)/3,t[1])
    dateNames[date] = $2
    fruitCnts[$8][date]++
}
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"

    printf "%s%s", "Row Labels", OFS
    for (date in dateNames) {
        printf "%s%s", dateNames[date], OFS
    }
    print "Grand Total"

    for (fruit in fruitCnts) {
        fruitTotal = 0
        printf "%s%s", fruit, OFS
        for (date in dateNames) {
            cnt = (date in fruitCnts[fruit] ? fruitCnts[fruit][date] : "")
            printf "%s%s", cnt, OFS
            dateTotals[date] += cnt
            fruitTotal += cnt
        }
        print fruitTotal
    }

    printf "%s%s", "Grand Total", OFS
    for (date in dateNames) {
        printf "%s%s", dateTotals[date], OFS
        total += dateTotals[date]
    }
    print total
}

$ awk -f tst.awk file
Row Labels      18-Aug-16       19-Aug-16       22-Aug-16       23-Aug-16       26-Aug-16       28-Aug-16       29-Aug-16       21-Sep-16   25-Sep-16        26-Sep-16       Grand Total
APPLE   1       1               1       5                                       1       9
GRAPE                   1               2       1       2                               6
ORANGE                                  1                       2       2               5
Grand Total     1       1       1       1       8       1       2       2       2       1       20

$ awk -f tst.awk file | column -s$'\t' -t
Row Labels   18-Aug-16  19-Aug-16  22-Aug-16  23-Aug-16  26-Aug-16  28-Aug-16  29-Aug-16  21-Sep-16  25-Sep-16  26-Sep-16  Grand Total
APPLE        1          1                     1          5                                                      1          9
GRAPE                              1                     2          1          2                                           6
ORANGE                                                   1                                2          2                     5
Grand Total  1          1          1          1          8          1          2          2          2          1          20
$