我有这个文件。
House.csv
M2018&03,HOUSE,BOX_X,16
M2018&03,HOUSE,FENCE_A,2
M2018&03,HOUSE,IRON_V + WINDOWS,1
M2018&03,HOUSE,DOOR + ROOF,7
M2018&03,HOUSE,TABLE + TV + LAPTOP,1
M2018&03,HOUSE,RADIO_A + RADIO_B + RADIO_C,3
M2018&03,HOUSE,CHAIR_A + CHAIR_B,2
我正在使用AWK更改文件并添加行号本身。
awk -F"," '{ gsub(/\_/,"\&",$1); if(NR < 10){print $1","$2",0"NR","$3","$4}else{print $1","$2","NR","$3","$4}}' House.csv > House2.csv
House2.csv
M2018_03,HOUSE,01,BOX_X,16
M2018_03,HOUSE,02,FENCE_A,2
M2018_03,HOUSE,03,IRON_V + WINDOWS,1
M2018_03,HOUSE,04,DOOR + ROOF,7
M2018_03,HOUSE,05,TABLE + TV + LAPTOP,1
M2018_03,HOUSE,06,RADIO_A + RADIO_B + RADIO_C,3
M2018_03,HOUSE,07,CHAIR_A + CHAIR_B,2
现在我想要把这个awk放在以前的awk中。
awk 'BEGIN{FS=OFS=","}{a[$1","$2]+=$5}END{ for (i in a) print i,a[i]}' House2.csv
请注意,我想要第5列中所有值的总和。 M2018_03,HOUSE,32
成为这样的文件。
M2018_03,HOUSE,01,BOX_X,16,32
M2018_03,HOUSE,02,FENCE_A,2,32
M2018_03,HOUSE,03,IRON_V + WINDOWS,1,32
M2018_03,HOUSE,04,DOOR + ROOF,7,32
M2018_03,HOUSE,05,TABLE + TV + LAPTOP,1,32
M2018_03,HOUSE,06,RADIO_A + RADIO_B + RADIO_C,3,32
M2018_03,HOUSE,07,CHAIR_A + CHAIR_B,2,32
答案 0 :(得分:2)
您无需多次调用awk:
awk -F, -v OFS=, '
NR == FNR {sum[$1,$2] += $NF; next}
{
$NF = $NF OFS sum[$1,$2]
gsub(/&/, "_", $1)
$2 = $2 OFS sprintf("%02d", FNR)
print
}
' House.csv House.csv
循环文件两次:首先计算总和,第二次将所有修改应用于该行。
输出
M2018_03,HOUSE,01,BOX_X,16,32
etc
答案 1 :(得分:1)
编辑: 添加解决方案,可以节省使用多个awk
的OP。
awk -F, 'FNR==NR{sum[$1]+=$NF;next} {val=$1;sub("&","_",val);$3=sprintf("%02d",FNR) OFS $3;print $0,sum[$1]}' OFS=, house.csv house.csv
以下内容对您有帮助。
awk -F, 'FNR==NR{sum[$1]+=$NF;next} {print $0,sum[$1]}' OFS=, house.csv house.csv
说明:现在也添加上述代码的说明。
awk -F, ' ##Making field separator as comma here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time house.csv will be read.
sum[$1]+=$NF; ##Creating an array named sum whose index is first field and value is last field and adding its value to same index items.
next} ##next will skip all further statements now.
{ ##This block will be executed when 2nd time house.csv is getting executed.
print $0,sum[$1] ##Printing current line along with array sum value whose index is current line first field.
}
' OFS=, house.csv house.csv ##Setting OFS as comma and mentioning Input_file 2 times here.