当字段在文件中匹配时,在csv中对多行进行求和

时间:2013-10-03 18:23:18

标签: bash shell awk

我有一个文件,我已将其修剪为如下所示:

"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

我想要的最终结果是每个城市的总金额。那就是:

"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"

公平警告,数据集不一定是连续的 - 也就是说,一个城市可能会出现一次,一次出现一千行,最后一次出现三次。

我一直在尝试使用以下awk脚本:

awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv

我得到的结果如下:

"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0

等。第二列全部为零,没有引号。

我显然错过了什么,但我不知道看什么或其他什么。我错过了什么?

感谢。

4 个答案:

答案 0 :(得分:3)

你失败的原因是因为双引号。

做这样的事情:

sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\""  i "\""  ","  "\"" array[i] "\"" }}' 

"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"

答案 1 :(得分:2)

这个awk单行将准确地给出您想要的格式:

awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file

使用您的数据进行测试:

kent$  cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

kent$  awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"

答案 2 :(得分:1)

"导致输入问题。首先使用sed删除它们,然后使用printf

中的awk将其打印回来

请尝试以下操作:

sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv

混乱输入

"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"

<强>输出

"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"

答案 3 :(得分:1)

您不需要预处理或讨厌的转义:

$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"