我有一个很大的CSV文件,其中包含以下段落:
first line1
second line1
third line1
fourth line1
first line2
second line2
third line2
fourth line2
处理完毕后,我希望将其翻译成:
first line1,second line1,third line1,fourth line1
first line2,second line2,third line2,fourth line2
注意:第一行,第二行等包含特殊字符。 ,“:
我认为一个选项可能是从第二行1找到“第二个”单词并用逗号替换前面的“enter”,这样第二行1将位于第一行的右侧。 / p>
我该怎么做?
事实上,上面的例子很可能不是真正的ACTUAL数据,这里是:
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";
MML Result:Successful.
",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";
MML Result:Successful.
",2016-07-25 22:18:05 DST
CSV文件包含很多这样的段落。
输出应该是(一行中的一个段落):
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20""; MML Result:Successful. ",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21""; MML Result:Successful. ",2016-07-25 22:18:05 DST
非常感谢你的帮助!
我尝试了你的解决方案,它几乎可以工作,但不是预期的结果。 由于这里的发布格式,我给你的例子似乎与源文件略有不同。
请在下面找到真正的源CSV文件(只有几行,因为完整的文件包含超过一百万个)
https://www.wetransfer.com/downloads/637b36b2148550ad090c22c9e8297a9c20160804081835/48b90b
对不起有误,再次感谢!
答案 0 :(得分:3)
另一种选择
$ awk '{ORS=NR%4?",":RS}1' file
每四行重置输出记录分隔符并打印。
答案 1 :(得分:1)
您可以使用paste
,例如:
$ paste -d, - - - - < file
first line1,second line1,third line1,fourth line1
first line2,second line2,third line2,fourth line2
-
表示标准输入,当您指定N个时(在此示例中N = 4),
paste
将从标准输入的N行形成一行。
-d
用于指定列分隔符,在此示例中为逗号。
答案 2 :(得分:0)
试试这个;
awk -v patt="first" 'BEGIN{ORS=","}$0 ~ patt {gsub(patt, "\n"patt)}1' CSVfile
答案 3 :(得分:0)
$ awk 'NR==1 {prev=$0; next} {printf "%s", prev; printf "%s", $0~/^[0-9]{9}/ ?"\n":","; prev=$0} END{print prev}' test.in
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20""; ,MML Result:Successful. ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21""; ,MML Result:Successful. ,",2016-07-25 22:18:05 DST
当新记录以一堆数字开头时,是时候换行了。下次,请在开头发布正确的数据。
答案 4 :(得分:0)
对FPAT使用GNU awk并且不假设您的输入有多少行或字段,或者在记录的开头/结尾出现了哪些字段值:
$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
# create strings that cannot exist in the input to map escaped quotes to
gsub(/a/,"aA")
gsub(/\\"/,"aB")
gsub(/""/,"aC")
# prepend previous incomplete record segment if any
$0 = prev $0
numq = gsub(/"/,"&")
if ( numq % 2 ) {
# this is inside double quotes so incomplete record
prev = $0 OFS
#prev = $0 RT # uncomment to retain newlines in the record
next
}
prev = ""
for (i=1;i<=NF;i++) {
# map the replacement strings back to their original values
gsub(/aC/,"\"\"",$i)
gsub(/aB/,"\\\"",$i)
gsub(/aA/,"a",$i)
}
print
#printf "Record %d:\n", ++recNr
#for (i=0;i<=NF;i++) {
#printf "\t$%d=<%s>\n", i, $i
#}
#print "#######"
}
$ awk -f decsv.awk file
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20""; ,MML Result:Successful. ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21""; ,MML Result:Successful. ,",2016-07-25 22:18:05 DST
有关详细信息,请参阅Awk to get .csv column containing commas and newlines。