如何通过忽略异常来替换SED的换行?

时间:2016-07-14 07:27:54

标签: shell awk sed linefeed

您好我有以下CSV输入数据,其中包含多个换行符和回车符。我试图用SED清理文件:

<div>Confirm:</div>
    <div>
        <select id="status" name="status" class="form-control" required>
            <option value="1">room2</option>
            <option value="2">rrom2</option>
        </select>
    </div>

 <div>Status:</div>
    <div>
        <select id="status" name="status" class="form-control" >
            <option value="1">Confirm booking</option>
            <option value="2">Cancel booking</option>
        </select>
    </div>

注意: CR和LF等于实际\ r和<\ n

我想替换所有没有前置的换行符 - 在这里导入双引号字符以供考虑。我设法过滤掉所有换行但不知道如何告诉SED忽略具有特定模式的那些。

预计输出如下:

"Data1","This<LF>
Is<LF>
Foobar"<CR><LF>
"Data2","Additional<LF>
Data<CR><LF>
With Inline CR LF<CR><LF>
End of Data."<CR><LF>

有什么想法吗?

2 个答案:

答案 0 :(得分:1)

您可以使用此gnu awk代替\r代替<CR>代替\n,而不是<LF>

awk -v BINMODE=3 -v RS='"\r\n"' 's!=""{printf "%s\"\n\"", s} {
   s = $0; gsub(/\r?\n/, " ", s)} END{print s}' file

"Data1","This Is Foobar"
"Data2","Additional Data Width Inline CR LF End of Data."

答案 1 :(得分:0)

将GNU awk用于多字符RS和RT:

$ cat tst.awk
BEGIN { RS="\"[^\"]*\"" }
RT != "" {
    gsub(/\r/,"")
    gsub(/[\r\n]+/," ",RT)
    printf "%s%s", $0, RT
}
END { print "" }

$ awk -f tst.awk file
"Data1","This Is Foobar"
"Data2","Additional Data With Inline CR LF End of Data."