以下是我的输入数据,我尝试创建数据透视表。
input.txt中
ID,CreateDate,Category,Region,PublishDate,Code,Listing,Type,ModifiedDate
FRU426131598,22-Aug-16,SELLING,COUNTRY,22-Aug-16,1,SAMPLE,GRAPE,22-Aug-16
FRU426175576,23-Aug-16,SELLING,COUNTRY,23-Aug-16,1,SAMPLE,APPLE,23-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,GRAPE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427163049,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU426972836,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,ORANGE,26-Aug-16
FRU427322180,28-Aug-16,SELLING,COUNTRY,28-Aug-16,1,SAMPLE,GRAPE,28-Aug-16
FRU427032658,26-Aug-16,SELLING,COUNTRY,26-Aug-16,1,SAMPLE,APPLE,26-Aug-16
FRU427373494,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16
FRU427373069,29-Aug-16,SELLING,COUNTRY,29-Aug-16,1,SAMPLE,GRAPE,29-Aug-16
FRU425669484,19-Aug-16,SELLING,COUNTRY,19-Aug-16,1,SAMPLE,APPLE,19-Aug-16
FRU425616815,18-Aug-16,SELLING,COUNTRY,18-Aug-16,1,SAMPLE,APPLE,18-Aug-16
FRU420018273,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16
FRU435018589,25-Sep-16,SELLING,COUNTRY,25-Sep-16,1,SAMPLE,ORANGE,25-Sep-16
FRU421375128,26-Sep-16,SELLING,COUNTRY,26-Sep-16,1,SAMPLE,APPLE,26-Sep-16
FRU434911933,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16
FRU434594125,21-Sep-16,SELLING,COUNTRY,21-Sep-16,1,SAMPLE,ORANGE,21-Sep-16
将字段归档为Row,将createDate归档为Columns。和ID字段的值的总和。
期望的输出:
Row Labels 18-Aug-16 19-Aug-16 22-Aug-16 23-Aug-16 26-Aug-16 28-Aug-16 29-Aug-16 21-Sep-16 25-Sep-16 26-Sep-16 Grand Total
APPLE 1 1 1 5 1 9
GRAPE 1 2 1 2 6
ORANGE 1 2 2 5
Grand Total 1 1 1 1 8 1 2 2 2 1 20
有什么办法吗?我可以使用awk获取createdDate的计数。但无法使用行和列创建数据透视表。
答案 0 :(得分:2)
awk
救援!
这可以帮助你入门......
$ awk -F, -v OFS='\t' 'NR>1 {k=$(NF-1); d=$2; keys[k]; dates[d]; a[k,d]++}
END {line="Row Labels";
for(d in dates) line = line OFS d;
print line;
for(k in keys)
{{line=k;
for(d in dates) line=line OFS a[k,d]}
print line}}' file
Row Labels 19-Aug-16 29-Aug-16 23-Aug-16 18-Aug-16 28-Aug-16 22-Aug-16 26-Aug-16 26-Sep-16 21-Sep-16 25-Sep-16
APPLE 1 1 1 5 1
ORANGE 1 2 2
GRAPE 2 1 1 2
您可能希望对日期进行排序(不是那么容易)并且可以添加总计(简单)。
答案 1 :(得分:0)
这是一种对日期进行排序的方法。需要GNU awk
awk -F, '
function date2epoch(date, arr,mon) {
split(date, arr, /-/)
mon = (index("JanFebMarAprMayJunJulAugSepOctNovDec", arr[2]) - 1) / 3 + 1
return mktime("20" arr[3] " " mon " " arr[1] " 0 0 0")
}
NR > 1 {
d = date2epoch($NF)
dates[d]
count[$(NF-1)][d]++
total[d]++
}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
printf "Row Label"
for (d in dates)
printf "\t%s", strftime("%d-%b-%y", d)
print ""
for (type in count) {
printf "%s", type
for (d in dates)
printf "\t%s", count[type][d]
print ""
}
printf "Total"
for (d in dates)
printf "\t%s", total[d]
print ""
}
' file
答案 2 :(得分:0)
使用GNU awk 4. *用于真正的多维数组和sorted_in:
$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
NR>1 {
split($2,t,/-/)
date = sprintf("%02d%02d%02d",t[3],(match("JanFebMarAprMayJunJulAugSepOctNovDec",t[2])+2)/3,t[1])
dateNames[date] = $2
fruitCnts[$8][date]++
}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
printf "%s%s", "Row Labels", OFS
for (date in dateNames) {
printf "%s%s", dateNames[date], OFS
}
print "Grand Total"
for (fruit in fruitCnts) {
fruitTotal = 0
printf "%s%s", fruit, OFS
for (date in dateNames) {
cnt = (date in fruitCnts[fruit] ? fruitCnts[fruit][date] : "")
printf "%s%s", cnt, OFS
dateTotals[date] += cnt
fruitTotal += cnt
}
print fruitTotal
}
printf "%s%s", "Grand Total", OFS
for (date in dateNames) {
printf "%s%s", dateTotals[date], OFS
total += dateTotals[date]
}
print total
}
$ awk -f tst.awk file
Row Labels 18-Aug-16 19-Aug-16 22-Aug-16 23-Aug-16 26-Aug-16 28-Aug-16 29-Aug-16 21-Sep-16 25-Sep-16 26-Sep-16 Grand Total
APPLE 1 1 1 5 1 9
GRAPE 1 2 1 2 6
ORANGE 1 2 2 5
Grand Total 1 1 1 1 8 1 2 2 2 1 20
$ awk -f tst.awk file | column -s$'\t' -t
Row Labels 18-Aug-16 19-Aug-16 22-Aug-16 23-Aug-16 26-Aug-16 28-Aug-16 29-Aug-16 21-Sep-16 25-Sep-16 26-Sep-16 Grand Total
APPLE 1 1 1 5 1 9
GRAPE 1 2 1 2 6
ORANGE 1 2 2 5
Grand Total 1 1 1 1 8 1 2 2 2 1 20
$