Question

How to separate tokens in line using Unix?显示文件可使用sed或xargs进行标记。

有没有办法反过来？

[在：]

some
sentences
are
like
this.

some
sentences
foo
bar
that

[OUT]：

some sentences are like this.
some sentences foo bar that

每个句子的唯一分隔符是\n\n。我可以在python中完成以下操作，但是否有unix方式？

def per_section(it):
  """ Read a file and yield sections using empty line as delimiter """
  section = []
  for line in it:
    if line.strip('\n'):
      section.append(line)
    else:
      yield ''.join(section)
      section = []
  # yield any remaining lines as a section too
  if section:
    yield ''.join(section)

print ["".join(i).replace("\n"," ") for i in per_section(codecs.open('outfile.txt','r','utf8'))]

[出来：]

[u'some sentences are like this. ', u'some sentences foo bar that ']

Answer 1

使用awk更容易处理这类任务：

awk -v RS="" '{$1=$1}7' file

如果你想在每一行中保留多个空格，你可以

awk -v RS="" -F'\n' '{$1=$1}7' file

以你的例子：

kent$  cat f
some
sentences
are
like
this.

some
sentences
foo
bar
that

kent$  awk -v RS=""  '{$1=$1}7' f   
some sentences are like this.
some sentences foo bar that

Answer 2

您可以使用awk命令执行以下操作：

awk -v RS="\n\n" '{gsub("\n"," ",$0);print $0}' file.txt

将记录分隔符设置为\n\n，这意味着字符串在由空行分隔的一组行中进行标记化。现在，在用空格字符替换所有\n之后打印该标记。

Answer 3

sed -n --posix 'H;$ {x;s/\n\([^[:cntrl:]]\{1,\}\)/\1 /gp;}' YourFile

基于空行分隔，所以每个字符串的长度也可以不同

每行文件中的一个标记反转换行标记化？ - Unix

3 个答案: