如何解析多值逗号分隔文件

时间:2016-03-08 09:51:34

标签: shell unix awk sh ksh

我有一个逗号分隔的文件,其中有6个字段,而我们收到了逗号作为字段的值,并用""括起来。我必须用hiphen替换这个逗号。

输入为

03/03/2016,Customer Service,CHAT,"Responded, closed",True,59
02/24/2016,Customer Service,CALL,Responded,True,55
03/03/2016,Customer Service,CHAT,"Responded, awaiting reply",False,46
02/24/2016,Customer Service,CALL,Responded,False,51
02/24/2016,Customer Service,CHAT,Responded,False,31

预期输出为

03/03/2016,Customer Service,CHAT,"Responded- closed",True,59
02/24/2016,Customer Service,CALL,Responded,True,55
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46
02/24/2016,Customer Service,CALL,Responded,False,51
02/24/2016,Customer Service,CHAT,Responded,False,31

1 个答案:

答案 0 :(得分:2)

在gnu-awk中使用FPAT,您可以这样做:

awk -v FPAT='"[^"]+"|[^,]+' -v OFS=, '{for(i=1; i<=NF; i++) gsub(/,/, "-", $i)} 1' file.csv
03/03/2016,Customer Service,CHAT,"Responded- closed",True,59
02/24/2016,Customer Service,CALL,Responded,True,55
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46
02/24/2016,Customer Service,CALL,Responded,False,51
02/24/2016,Customer Service,CHAT,Responded,False,31

使用sed即可:

sed -E ':a; s/("[^,"]+),([^"]*")/\1-\2/g; ta;' file.csv
03/03/2016,Customer Service,CHAT,"Responded- closed",True,59
02/24/2016,Customer Service,CALL,Responded,True,55
03/03/2016,Customer Service,CHAT,"Responded- awaiting reply",False,46
02/24/2016,Customer Service,CALL,Responded,False,51
02/24/2016,Customer Service,CHAT,Responded,False,31