如何创建没有不必要空格的CSV文件

时间:2014-07-28 12:54:36

标签: linux bash perl awk sed

我使用 xls2csv 二进制文件,以便在我的linux red-hat机器中将 XLS doc转换为 CSV

示例:(来自手册页)

 xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

但我注意到以下问题 - 步骤1,2 (以下问题导致我的bash脚本出现许多问题)

问题在于:

(1) CSV文件包含不必要的空格(在单词的左侧或单词的右侧)

CSV

中语法错误的示例
 ,"/var/adm/sys ldd/all  /Comm/logs   ","WORD "," WORD"

csv中正确语法的示例

 ,"/var/adm/sys ldd/all  /Comm/logs",WORD,WORD

(2)引号出现在CSV中,即使单词是(分隔符之间的一个单词),实际上我们不需要在分隔符之间的单个单词的情况下使用引号(分隔符“,” )

CSV

中语法错误的示例
 ," WORD ",

csv中正确语法的示例

 ,WORD,

请根据步骤1,2

建议如何解决此处所述的问题,以创建“干净的csv文件”

实施可以使用 awk,sed,perl one liner,或者bash下的任何解决方案脚本

修复前的CSV文件示例

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,"234245 ",24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt"," 45356",435,"  578 58976  "
 3,"   add this line in crontab    :",34356,"234245 ",24245
 4,"1.0348    54 35.5"," 45356","   435","578 "
 4,"1 2 "," 45356 95857 ","   435","578 "
 5,"1 2 "," 45356 95857 ","   "435","578" "
 6,"1.0348    54 35.5"," 45356"," "4"""    ""35","578 "
 7,"1.0348    54 35.5",""45356",""4"""""35,"578 "

正确的CSV文件示例(修复后)

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt",45356,435,"578 58976"
 3,"add this line in crontab    :",34356,234245,24245
 4,"1.0348    54 35.5",45356,435,578 
 4,"1 2","45356 95857",435,578
 5,"1 2","45356 95857","435,578" 
 6,"1.0348    54 35.5",45356,"4"""    ""35,578
 7,"1.0348    54 35.5",""45356",""4"""""35,578

逗号不能出现在字段中。

请注意line 2字段中包含的显式换行符。

如果字段在双引号内并且不包含空格(例如第7行""45356"),则不得删除这些双引号,因为包含这些引号的整个字段是编码密码

2 个答案:

答案 0 :(得分:0)

awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) { gsub(/(^"?[[:space:]]*|[[:space:]]*"?$)/, "", $i); if ($i ~ /[[:space:]]/) $i = "\"" $i "\"" } } 1' file

输出:

1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all  /Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab    :",34356,234245,24245
4,"1.0348    54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"

唯一的问题是价值观不能包含逗号,例如"This is, a value."

答案 1 :(得分:0)

试试这个perl one liner:

perl -i -nle 'chomp($_);$_=~s/\s*"\s*/"/sg;print "$_"' file