bash在给定模式之前用逗号查找并替换“enter”

时间:2016-08-03 13:41:30

标签: linux bash awk sed tr

我有一个很大的CSV文件,其中包含以下段落:

first line1  
second line1  
third line1  
fourth line1  
first line2  
second line2  
third line2  
fourth line2

处理完毕后,我希望将其翻译成:

first line1,second line1,third line1,fourth line1  
first line2,second line2,third line2,fourth line2

注意:第一行,第二行等包含特殊字符。 ,“:

我认为一个选项可能是从第二行1找到“第二个”单词并用逗号替换前面的“enter”,这样第二行1将位于第一行的右侧。 / p>

我该怎么做?

事实上,上面的例子很可能不是真正的ACTUAL数据,这里是:

137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  
MML Result:Successful.  
",2016-07-25 23:19:05 DST  
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  
MML Result:Successful.  
",2016-07-25 22:18:05 DST

CSV文件包含很多这样的段落。

输出应该是(一行中的一个段落):

137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20""; MML Result:Successful.  ",2016-07-25 23:19:05 DST    
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21""; MML Result:Successful.  ",2016-07-25 22:18:05 DST

非常感谢你的帮助!

我尝试了你的解决方案,它几乎可以工作,但不是预期的结果。 由于这里的发布格式,我给你的例子似乎与源文件略有不同。

请在下面找到真正的源CSV文件(只有几行,因为完整的文件包含超过一百万个)

https://www.wetransfer.com/downloads/637b36b2148550ad090c22c9e8297a9c20160804081835/48b90b

对不起有误,再次感谢!

5 个答案:

答案 0 :(得分:3)

另一种选择

$ awk '{ORS=NR%4?",":RS}1' file

每四行重置输出记录分隔符并打印。

答案 1 :(得分:1)

您可以使用paste,例如:

$ paste -d, - - - - < file
first line1,second line1,third line1,fourth line1
first line2,second line2,third line2,fourth line2

-表示标准输入,当您指定N个时(在此示例中N = 4), paste将从标准输入的N行形成一行。

-d用于指定列分隔符,在此示例中为逗号。

答案 2 :(得分:0)

试试这个;

 awk -v patt="first" 'BEGIN{ORS=","}$0 ~ patt {gsub(patt, "\n"patt)}1'  CSVfile

答案 3 :(得分:0)

$ awk 'NR==1 {prev=$0; next} {printf "%s", prev; printf "%s", $0~/^[0-9]{9}/ ?"\n":","; prev=$0} END{print prev}' test.in
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  ,MML Result:Successful.  ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  ,MML Result:Successful.  ,",2016-07-25 22:18:05 DST

当新记录以一堆数字开头时,是时候换行了。下次,请在开头发布正确的数据。

答案 4 :(得分:0)

对FPAT使用GNU awk并且不假设您的输入有多少行或字段,或者在记录的开头/结尾出现了哪些字段值:

$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
    # create strings that cannot exist in the input to map escaped quotes to
    gsub(/a/,"aA")
    gsub(/\\"/,"aB")
    gsub(/""/,"aC")

    # prepend previous incomplete record segment if any
    $0 = prev $0
    numq = gsub(/"/,"&")
    if ( numq % 2 ) {
        # this is inside double quotes so incomplete record
        prev = $0 OFS
        #prev = $0 RT   # uncomment to retain newlines in the record
        next
    }
    prev = ""

    for (i=1;i<=NF;i++) {
        # map the replacement strings back to their original values
        gsub(/aC/,"\"\"",$i)
        gsub(/aB/,"\\\"",$i)
        gsub(/aA/,"a",$i)
    }

    print

    #printf "Record %d:\n", ++recNr
    #for (i=0;i<=NF;i++) {
        #printf "\t$%d=<%s>\n", i, $i
    #}
    #print "#######"
}

$ awk -f decsv.awk file
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  ,MML Result:Successful.  ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  ,MML Result:Successful.  ,",2016-07-25 22:18:05 DST

有关详细信息,请参阅Awk to get .csv column containing commas and newlines