How to separate tokens in line using Unix?显示文件可使用sed
或xargs
进行标记。
有没有办法反过来?
[在:]
some
sentences
are
like
this.
some
sentences
foo
bar
that
[OUT]:
some sentences are like this.
some sentences foo bar that
每个句子的唯一分隔符是\n\n
。我可以在python中完成以下操作,但是否有unix方式?
def per_section(it):
""" Read a file and yield sections using empty line as delimiter """
section = []
for line in it:
if line.strip('\n'):
section.append(line)
else:
yield ''.join(section)
section = []
# yield any remaining lines as a section too
if section:
yield ''.join(section)
print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]
[出来:]
[u'some sentences are like this. ', u'some sentences foo bar that ']
答案 0 :(得分:3)
使用awk更容易处理这类任务:
awk -v RS="" '{$1=$1}7' file
如果你想在每一行中保留多个空格,你可以
awk -v RS="" -F'\n' '{$1=$1}7' file
以你的例子:
kent$ cat f
some
sentences
are
like
this.
some
sentences
foo
bar
that
kent$ awk -v RS="" '{$1=$1}7' f
some sentences are like this.
some sentences foo bar that
答案 1 :(得分:0)
您可以使用awk
命令执行以下操作:
awk -v RS="\n\n" '{gsub("\n"," ",$0);print $0}' file.txt
将记录分隔符设置为\n\n
,这意味着字符串在由空行分隔的一组行中进行标记化。现在,在用空格字符替换所有\n
之后打印该标记。
答案 2 :(得分:0)
sed -n --posix 'H;$ {x;s/\n\([^[:cntrl:]]\{1,\}\)/\1 /gp;}' YourFile
基于空行分隔,所以每个字符串的长度也可以不同