我有一个 .CSV 文件(比如说tab_delimited_file.csv),我从特定供应商的门户网站下载。当我将文件移动到我的一个Linux目录时,我注意到这个特定的 .CSV 文件实际上是一个制表符分隔文件,名为 .CSV < / strong>即可。请在下面找到该文件的几个示例记录。
"""column1""" """column2""" """column3""" """column4""" """column5""" """column6""" """column7"""
12 455 string with quotes, and with a comma in between 4432 6787 890 88
4432 6787 another, string with quotes, and with two comma in between 890 88 12 455
11 22 simple string 77 777 333 22
以上样本记录由tabs
分隔。我知道文件的标题非常奇怪,但这是我收到文件格式的方式。
我尝试使用 tr
命令将tabs
替换为commas
,但由于记录值中的额外逗号,文件完全混乱。我需要用逗号括起来的记录值用双引号括起来。我使用的命令如下。
tr '\t' ',' < tab_delimited_file.csv > comma_separated_file.csv
这会将文件转换为以下格式。
"""column1""","""column2""","""column3""","""column4""","""column5""","""column6""","""column7"""
12,455,string with quotes, and with a comma in between,4432,6787,890,88
4432,6787,another, string with quotes, and with two comma in between,890,88,12,455
11,22,simple string,77,777,333,22
我需要帮助才能将示例文件转换为以下格式。
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22
使用 sed
或 awk
的任何解决方案都非常有用。
答案 0 :(得分:2)
这将产生你要求的输出,但是不清楚我假设的标准是否适用于哪些字段放在引号中(任何包含逗号或空格),例如,实际上是你的想要用其他输入自己测试一下,看看:
$ awk 'BEGIN { FS=OFS="\t" }
{
gsub(/"/,"")
for (i=1;i<=NF;i++)
if ($i ~ /[,[:space:]]/)
$i = "\"" $i "\""
gsub(OFS,",")
print
}
' file
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22
答案 1 :(得分:1)
使用awk的一种方式:
awk '
BEGIN { FS = "\t"; OFS = "," }
FNR == 1 {
for ( i = 1; i <= NF; i++ ) { gsub( /"+/, "", $i ) }
print $0
next
}
FNR > 1 {
for ( i = 1; i <= NF; i++ ) {
w = split( $i, _, " " )
if ( w > 1 ) { $i = "\"" $i "\"" }
}
print $0
}
' infile
它使用选项卡分割输入中的字段和逗号以在输出中写入。对于标题很简单,简单删除所有双引号。对于数据行,仅当拆分返回多个字段时,对于每个用空格分割的字段和用双引号括起来。
它产生:
column1,column2,column3,column4,column5,column6,column7
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22