得到一列+ awk +双引号+迭代的总和

时间:2016-05-17 03:43:33

标签: awk

这是我的档案:

$ cat -v test6 | head
"Rec_Open_Date"|"MSISDN"|"IMEI"|"Data_Volume_Bytes"|"Device_Manufacturer"|"Device_Model"|"Product_Description"|"Data_Volume_MB"|">20MB/30"|">200MB/30"|">2048MB/30"|">5120MB/30"|">10240MB/30"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|"$39 Plan"|"0.131383"|"0"|"0"|"0"|"0"|"0"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY  STD - TRIAL - #16"|"0"|"0"|"0"|"0"|"0"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - $0 -"|"0"|"0"|"0"|"0"|"0"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"|"1"|"1"|"0"|"0"|"0"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|"$29 CARRYOVER PLAN"|"108.486"|"1"|"1"|"1"|"0"|"0"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - $0 - #2"|"18.9218"|"1"|"1"|"0"|"0"|"0"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD - $1 - #4"|"0"|"0"|"0"|"0"|"0"|"0"
"2015-10-06"|"731"|"570"|"2258243"|"Samsung Korea"|"Samsung SM-N910U"|"Business Freedom"|"2.15363"|"1"|"0"|"0"|"0"|"0"
"2015-10-06"|"556"|"910"|"13332272"|"Samsung Korea"|"Samsung GT-I9505"|"$49 Plan"|"12.7146"|"1"|"1"|"0"|"0"|"0"

这是我可以获得1列的sume的方式,我必须删除"使用gsub。

$ awk -F'|' 'NR>1{n=$9; gsub(/"/,"",n); sum+=n} END {print sum}' test6
684

我想要做的是实现像here这样遍历每一列的东西。

awk '{for (i=1;i<=NF;i++) sum[i]+=$i;}; END{for (i in sum) print "for column "i" is " sum[i];}' FileA

这是我的尝试(其中之一),但它显示2000所有列不正确,对于column9 684应为$9我如何实现这一目标?

$ awk -F'|' '{for (i=9;i<=NF;i++) sum[i]+=gsub(/"/,"",$i);}; END{for (i in sum) print "for column "i" is " sum[i];}' test6
for column 10 is 2000
for column 11 is 2000
for column 12 is 2000
for column 13 is 2000
for column 9 is 2000

as和addon问题,如果我能做到这样的事情会有什么好处:

>20MB/30 is 684
>200MB/30 is x
>2048MB/30 is y
>5120MB/30 is z
>10240MB/30 is aa

我使用NR==1尝试了这一点,但没有走得太远。

EDIT1 可能会出现在这里

awk -F'|' 'NR>1{for (i=9;i<NF;i++) n=$i; gsub(/"/,"",n); sum[i]+=n} END {print sum[i]}' test6
24

EDIT2 以某种方式为我创建了sum数组:

$ awk -F'|' '{for (i=9;i<NF;i++) n=$i; gsub(/"/,"",n); sum[i]+=n} END {for(i=9;i<14;i++) print i ":"sum[i];}' test6
9:
10:
11:
12:
13:24

EDIT3

转到下面的答案,只需要做一些格式化:

awk -F'|' 'NR>1{for (i=9;i<=NF;i++) {gsub(/"/,"",$i); sum[i]+=$i}}; NR==1{for (i=9;i<=NF;i++) {col[i]=$i}};  END{for (i in sum) print "for column "col[i]"  the sum is " sum[i];}' test6
for column ">200MB/30"  the sum is 457
for column ">2048MB/30"  the sum is 86
for column ">5120MB/30"  the sum is 24
for column ">10240MB/30"  the sum is 6
for column ">20MB/30"  the sum is 684

2 个答案:

答案 0 :(得分:1)

尝试:

sum[i]+=gsub(/"/,"",$i)

问题在于:

gsub

$i修改{gsub(/"/,"",$i); sum[i]+=$i} 并返回它所做的替换次数,通常为2.上述声明将累加替换次数。替换为:

$i

修改sum[i],然后将其添加到MVC 5

答案 1 :(得分:0)

不是使用|作为分隔符,而是使用"代替(并删除gsub):

 awk -F'\"' 'NR>1{for(i=9;i<NF/2;i++)sum[i]+=$(i*2)}END{for(i in sum) print "for column "i" is "sum[i]}' test

您只需要调整for循环以获取偶数索引引用的值。