Question

我想从一般的纯文本中创建新版本的文档，以便每个版本新版文档每行包含一个句子。这意味着，每行文本都包含以.结尾的字符串序列。你能为我推荐一些示例脚本吗？

 In the beginning God created the heavens and the earth.
 Now the earth was formless and empty.  Darkness was on the surface
 of the deep.  God's Spirit was hovering over the surface
 of the waters.

向

 In the beginning God created the heavens and the earth.
 Now the earth was formless and empty.
 Darkness was on the surface of the deep.
 God's Spirit was hovering over the surface of the waters.

Answer 1

awk 'BEGIN {RS = "[.] *"; ORS = ".\n"} {gsub(" *\n *", " "); if ($0 !~ /^ +$/) print}'

在每个句点分隔文本，后跟空格（如果有）RS）。

每行的输出后面跟一个句号和换行符（ORS）。

为每个换行符和任何周围空格（gsub()）替换一个空格。

如果该行不仅仅包含空格，请将其打印出来。

如果您想要容纳标签和空格，您可以将显示空格后跟星号或加号的地方更改为[[:blank:]]（后跟星号或加号）。

Answer 2

使用perl的一种方式：

perl -pe 's/\n\Z/ /; s/(\.)\s*/$1\n/g' infile

输出：

In the beginning God created the heavens and the earth.
Now the earth was formless and empty.
Darkness was on the surface of the deep.
God's Spirit was hovering over the surface of the waters.

Answer 3

首先，尝试tr和sed

的组合

$ cat input
They're selling postcards of the hanging. They're painting the passports brown. The beauty parlor is filled with sailors. The circus is in town.


$ cat input | tr '.' '\n' | sed 's/$/\./;s/[    ]*//'
They're selling postcards of the hanging.
They're painting the passports brown.
The beauty parlor is filled with sailors.
The circus is in town.

从纯文本中为每行文档准备单句

3 个答案: