我有一个文件,我已将其修剪为如下所示:
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"
等
我想要的最终结果是每个城市的总金额。那就是:
"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"
等
公平警告,数据集不一定是连续的 - 也就是说,一个城市可能会出现一次,一次出现一千行,最后一次出现三次。
我一直在尝试使用以下awk脚本:
awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv
我得到的结果如下:
"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0
等。第二列全部为零,没有引号。
我显然错过了什么,但我不知道看什么或其他什么。我错过了什么?
感谢。
答案 0 :(得分:3)
你失败的原因是因为双引号。
做这样的事情:
sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\"" i "\"" "," "\"" array[i] "\"" }}'
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"
答案 1 :(得分:2)
这个awk单行将准确地给出您想要的格式:
awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file
使用您的数据进行测试:
kent$ cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"
kent$ awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"
答案 2 :(得分:1)
"
导致输入问题。首先使用sed
删除它们,然后使用printf
awk
将其打印回来
请尝试以下操作:
sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv
混乱输入
"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"
<强>输出强>
"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
答案 3 :(得分:1)
您不需要预处理或讨厌的转义:
$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"