Question

我有一个大型的标记化句子文件。不同的句子通过空行分开。输入文件基本上只是一个大的列。

我希望以每个唯一句子都有自己的行的方式转置单列。

输入：

Sentence1
Sentence1
Sentence1
Sentence1

Sentence2
Sentence2
Sentence2

...

SentenceN

期望的输出将是：

Sentence1 Sentence1 Sentence1 Sentence1
Sentence2 Sentence2 Sentence2
...

我一直在寻找grep，awk，sed和tr，但我正在努力使用正确的语法。

谢谢！

Answer 1

如果您明智地选择了记录和字段分隔符，那么awk '$1=$1' RS= FS="\n" OFS=" " infile就会很简单：

Sentence1 Sentence1 Sentence1 Sentence1
Sentence2 Sentence2 Sentence2
...
SentenceN

输出：

RS=

解释

FS="\n"将记录分隔符设置为“空行”。
OFS=" "将字段分隔符设置为换行符。
$1=$1将输出分隔符设置为空格。
FS重新评估输入并根据OFS将其拆分。这也评估为true，因此输出Func作为分隔符的输入。

Answer 2

使用perl：

轻松过关

#!/usr/bin/env perl

use strict;
use warnings;

local $/ = "\n\n";

while ( <DATA> ) {
   s/\n/ /g;
   print;
   print "\n";
}

__DATA__
Sentence1
Sentence1
Sentence1
Sentence1

Sentence2
Sentence2
Sentence2

或者是单行的：

perl -00 -pe 's/\n/ /g'

Answer 3

awk解决方案

awk '{ if($1~"^$") {print a;a="";} else a=a" "$0;} END {print a}' test.txt

如何从空行分隔的列中提取行？

3 个答案:

解释