awk +使用awk和for循环进行组求和,用于读入和输出

时间:2016-07-29 03:28:51

标签: bash awk

这是我的档案:

$ cat -v  head_datafile_pipe_deleimiter_more_columns.csv
"Rec_Open_Date"|"MSISDN"|"IMEI"|"Data_Volume_Bytes"|"Device_Manufacturer"|"Device_Model"|"Product_Description"|"Data_Volume_MB"|">20MB/30"|">200MB/30"|">2048MB/30"|">5120MB/30"|">10240MB/30"
"2016-07-25"|"537"|"190"|"48195367"|"Samsung Korea"|"Samsung SM-G928I"|"$49.95 Carryover Plan"|"45.9627"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"370"|"780"|"92875513"|"Samsung Korea"|"Samsung SM-J500Y"|"$39.95 Plan"|"88.573"|"1"|"1"|"1"|"0"|"0"
"2016-07-25"|"024"|"640"|"35174867"|"Samsung Korea"|"Samsung GT-I9505"|"$29.95 Plan"|"33.5454"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"091"|"660"|"0"|"LG Electronics Inc."|"LG LG-D802"|"PREPAY  PLUS - TRIAL - #16"|"0"|"0"|"0"|"0"|"0"|"0"
"2016-07-25"|"706"|"300"|"33352130"|"Apple Inc"|"Apple iPhone 5S A1530"|"PREPAY PLUS - $0 -"|"31.8071"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"917"|"350"|"85820068"|"Apple Inc"|"Apple iPhone 6S Plus A1687"|"$39.95 Plan"|"81.8444"|"1"|"1"|"1"|"0"|"0"
"2016-07-25"|"027"|"070"|"9718436"|"Samsung Korea"|"Samsung SM-G900F"|"$29.95 Plan"|"9.26822"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"173"|"780"|"19152160"|"Apple Inc"|"Apple iPhone 6S A1688"|"$49.95 Plan"|"18.2649"|"1"|"1"|"0"|"0"|"0"
"2016-07-25"|"685"|"140"|"17443425"|"Apple Inc"|"Apple iPhone 6S Plus A1687"|"$89.95 Plan"|"16.6353"|"1"|"1"|"0"|"0"|"0"

这是我创建的数组

$ mapfile -t u_vals <<<"$(awk -F'|' 'NR>1{print $7}' head_datafile_pipe_deleimiter_more_columns.csv | sort | uniq)"
$ declare -p u_vals
declare -a u_vals='([0]="\"\$29.95 Plan\"" [1]="\"\$39.95 Plan\"" [2]="\"\$49.95 Carryover Plan\"" [3]="\"\$49.95 Plan\"" [4]="\"\$89.95 Plan\"" [5]="\"PREPAY  PLUS - TRIAL - #16\"" [6]="\"PREPAY PLUS - \$0 -\"")'

这是我想要的输出的一个例子,但这只是第8-9列的总和,我想做8-NF列。这不使用for循环。

$ for el in "${u_vals[@]}"; do  awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {gsub(/"/,"",$8); sum+=$8; gsub(/"/,"",$9); sum2+=$9} }; NR>1{count++} END { print "output" FS j FS q count q FS sum FS sum2}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
     1  output|"$29.95 Plan"|"9"|42.8136|2
     2  output|"$39.95 Plan"|"9"|170.417|2
     3  output|"$49.95 Carryover Plan"|"9"|45.9627|1
     4  output|"$49.95 Plan"|"9"|18.2649|1
     5  output|"$89.95 Plan"|"9"|16.6353|1
     6  output|"PREPAY  PLUS - TRIAL - #16"|"9"|0|0
     7  output|"PREPAY PLUS - $0 -"|"9"|31.8071|1

但是我想为cols 8-NF创建一个for循环,我试图在这里为col 8-9 {for(i=8;i<=9;i++) gsub(/"/,"",$i); sum[i]+=$i}做,但我不确定它是否可以完成

$ for el in "${u_vals[@]}"; do  awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=9;i++) gsub(/"/,"",$i); sum[i]+=$i} }; NR>1{count++} END { for(i=8;i<=9;i++) print "output" FS j FS q count q FS sum[i] FS sum[i]}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n

这是我想要的输出,这只是cols 8和9分组总结在这里:

........................|col8   |col9|...

output|"$29.95 Plan"|"9"|42.8136|2|<other column sums here>
output|"$39.95 Plan"|"9"|170.417|2 ...
output|"$49.95 Carryover Plan"|"9"|45.9627|1
output|"$49.95 Plan"|"9"|18.2649|1
output|"$89.95 Plan"|"9"|16.6353|1
output|"PREPAY  PLUS - TRIAL - #16"|"9"|0|0
output|"PREPAY PLUS - $0 -"|"9"|31.8071|1|<other column sums here>

EDIT1

尝试理解print printf之间的区别,这对我来说有帮助。

下面两个命令之间的唯一区别:

print ("output" FS j FS q count q FS sum[8]);

printf ("output" FS j FS q count q FS sum[8]);

$ for el in "${u_vals[@]}"; do  awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=NF;i++){gsub(/"/,"",$i); sum[i]+=$i}} }; NR>1{count++} END { print ("output" FS j FS q count q FS sum[8]); for(i=8;i<=NF; i++){ printf("%s",FS sum[i])} printf("\n")}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
     1  output|"$29.95 Plan"|"9"|42.8136
     2  |42.8136|2|2|0|0|0
     3  output|"$39.95 Plan"|"9"|170.417
     4  |170.417|2|2|2|0|0
     5  output|"$49.95 Carryover Plan"|"9"|45.9627
     6  |45.9627|1|1|0|0|0
     7  output|"$49.95 Plan"|"9"|18.2649
     8  |18.2649|1|1|0|0|0
     9  output|"$89.95 Plan"|"9"|16.6353
    10  |16.6353|1|1|0|0|0
    11  output|"PREPAY  PLUS - TRIAL - #16"|"9"|0
    12  |0|0|0|0|0|0
    13  output|"PREPAY PLUS - $0 -"|"9"|31.8071
    14  |31.8071|1|1|0|0|0
1415137039 [main] -sh 4624 sig_send: error sending signal -66, pipe handle 0x1210, nb 132, packsize 0, Win32 error 0

$ for el in "${u_vals[@]}"; do  awk -F"|" -v q='"' -v j="$el" '
NR>1{if($7 == j) {for(i=8;i<=NF;i++){gsub(/"/,"",$i); sum[i]+=$i}} }; NR>1{count++} END { printf ("output" FS j FS q count q FS sum[8]); for(i=8;i<=NF; i++){ printf("%s",FS sum[i])} printf("\n")}' head_datafile_pipe_deleimiter_more_columns.csv; done | cat -n
     1  output|"$29.95 Plan"|"9"|42.8136|42.8136|2|2|0|0|0
     2  output|"$39.95 Plan"|"9"|170.417|170.417|2|2|2|0|0
     3  output|"$49.95 Carryover Plan"|"9"|45.9627|45.9627|1|1|0|0|0
     4  output|"$49.95 Plan"|"9"|18.2649|18.2649|1|1|0|0|0
     5  output|"$89.95 Plan"|"9"|16.6353|16.6353|1|1|0|0|0
     6  output|"PREPAY  PLUS - TRIAL - #16"|"9"|0|0|0|0|0|0|0
     7  output|"PREPAY PLUS - $0 -"|"9"|31.8071|31.8071|1|1|0|0|0

1 个答案:

答案 0 :(得分:0)

你可以简单地迭代8到NF。您还必须使用一个初始print替换END块中的最终printf调用,以打印您要在8:NF字段之前输出的字段,然后使用{{ 1}}调用for循环内部打印8:NF字段,最后在for循环后打印换行符。我们必须使用printf代替printf进行前两次打印,以防止换行过早出现。

print