Question

我有一个包含随机换行符的文本文件。所有新行都以“客户端”一词开头。如何删除第二行和第三行末尾看到的额外换行符？

client | This is first row | 2013-02-01 23:45:59 | last column
clientd | second row with a line break
third line part of row 2 | 2013-01-31 12:44:00 | last column
client xyz | some text here | 2013-12-21 
12:54:12 | last column

预期结果：

client | This is first row | 2013-02-01 23:45:59 | last column
clientd | second row with a line break third line part of row 2 | 2013-01-31 12:44:00 | last column
client xyz | some text here | 2013-12-21 12:54:12 | last column

sed命令有效，但如果可能的话，我正在寻找任何改进。

cat test.txt | tr '\n' ' ' | sed 's/client/\nclient/g'

有没有其他方法可以实现这个目标？

Answer 1

这是另一个awk单行：

awk -vRS='(^|\n)client' 'NR>1{print "client"gensub("\n"," ","g",$0)}' file

通过将记录分隔符（RS）设置为与行开头的client匹配的正则表达式来工作。

也可以编写一个正则表达式，该表达式将匹配换行符，后跟client以外的其他内容，但它并不漂亮：

\n([^c]|c[^l]|cl[^i]|cli[^e]|clie[^n]|clien[^t])

如果你的数据文件不是太大而无法将整个文件读入内存，你可以使用上面的perl，例如：

perl -0777pe "s/\n([^c]|c[^l]|cl[^i]|cli[^e]|clie[^n]|clien[^t])/ \1/g" file

（以上是不完美的，因为每个替代中的“不匹配”字符可能是换行符，在这种情况下它不会更改为空格。可以通过更改{{1}的每个实例来修复它} [^X]，如果你真的想使用它，你应该这样做。）

Answer 2

一种方式：

awk '/^client/{if (x)print x;x=$0;next}{x=x FS $0;}END{print x}' file

每次遇到客户记录时，打印上一条记录并开始在变量x中累积当前记录，直到检索到下一条客户记录。

Answer 3

的Python

>>> with open('test.txt') as fin:
        print fin.readline().rstrip(), # don't prepend \n to first line
        for line in fin:
            print line.rstrip().replace('client', '\nclient'),


client | This is first row | 2013-02-01 23:45:59 | last column 
clientd | second row with a line break third line part of row 2 | 2013-01-31 12:44:00 | last column 
client xyz | some text here | 2013-12-21 12:54:12 | last column

Answer 4

这可能适合你（GNU sed）：

sed -r ':a;$!N;/^(client).*\n\1/!{s/\n/ /;ta};P;D' file

这会用空格替换额外的换行符，如果不需要空格，请使用：

sed -r ':a;$!N;/^(client).*\n\1/!{s/\n//;ta};P;D' file

通过删除换行符将一些连续的行合并为一行

4 个答案: