我使用 xls2csv 二进制文件,以便在我的linux red-hat机器中将 XLS doc转换为 CSV ,
示例:(来自手册页)
xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8
但我注意到以下问题 - 步骤1,2 (以下问题导致我的bash脚本出现许多问题)
问题在于:
(1) CSV文件包含不必要的空格(在单词的左侧或单词的右侧)
CSV
中语法错误的示例 ,"/var/adm/sys ldd/all /Comm/logs ","WORD "," WORD"
csv中正确语法的示例
,"/var/adm/sys ldd/all /Comm/logs",WORD,WORD
(2)引号出现在CSV中,即使单词是(分隔符之间的一个单词),实际上我们不需要在分隔符之间的单个单词的情况下使用引号(分隔符“,” )
CSV
中语法错误的示例 ," WORD ",
csv中正确语法的示例
,WORD,
请根据步骤1,2
建议如何解决此处所述的问题,以创建“干净的csv文件”实施可以使用 awk,sed,perl one liner,或者bash下的任何解决方案脚本
修复前的CSV文件示例
1,"/var/adm/sys ldd/all /Comm/logs",34356,"234245 ",24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt"," 45356",435," 578 58976 "
3," add this line in crontab :",34356,"234245 ",24245
4,"1.0348 54 35.5"," 45356"," 435","578 "
4,"1 2 "," 45356 95857 "," 435","578 "
5,"1 2 "," 45356 95857 "," "435","578" "
6,"1.0348 54 35.5"," 45356"," "4""" ""35","578 "
7,"1.0348 54 35.5",""45356",""4"""""35,"578 "
正确的CSV文件示例(修复后)
1,"/var/adm/sys ldd/all /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab :",34356,234245,24245
4,"1.0348 54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"
6,"1.0348 54 35.5",45356,"4""" ""35,578
7,"1.0348 54 35.5",""45356",""4"""""35,578
逗号不能出现在字段中。
请注意line 2
字段中包含的显式换行符。
如果字段在双引号内并且不包含空格(例如第7行""45356"
),则不得删除这些双引号,因为包含这些引号的整个字段是编码密码
答案 0 :(得分:0)
awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) { gsub(/(^"?[[:space:]]*|[[:space:]]*"?$)/, "", $i); if ($i ~ /[[:space:]]/) $i = "\"" $i "\"" } } 1' file
输出:
1,"/var/adm/sys ldd/all /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all /Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab :",34356,234245,24245
4,"1.0348 54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"
唯一的问题是价值观不能包含逗号,例如"This is, a value."
。
答案 1 :(得分:0)
试试这个perl one liner:
perl -i -nle 'chomp($_);$_=~s/\s*"\s*/"/sg;print "$_"' file