Question

我有一个字符串，其中有几个部分名为“Section 1”......“Section 20”，并希望将此字符串拆分为这些单独的部分。这是一个例子：

Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section

我想把它分成

["Section 1\n Text within this section, may contain the word section.\n\nAnd go in for quite a bit.",
"Section 15 Another section"]

我感觉很蠢，因为没有把它弄好。我的尝试总能抓住一切。现在我有

/(Section.+\d+$[\s\S]+)/

但我不能从中得到贪婪。

Answer 1

在我看来，分割文本的Regexp如下：

/(?:\n\n|^)Section/

所以代码是：

str = "
Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section
"

newstr = str.split( /(?:\n\n|^)Section/, -1 )[1..-1].map {|l| "Section " + l.strip }
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.", "Section 15\nAnother section"]

Answer 2

您可以使用此正则表达式：

(?m)(Section\s*\d+)(.*?\1)$

Live demo

Answer 3

您可以将scan与此正则表达式/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m

一起使用

string.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)

Section\s\d+\n将匹配任何Section标头

除了另一个标题标题外，

(?:.(?!Section\s\d+\n))*将匹配其他任何内容。

m也会让点匹配换行符

sample = <<SAMPLE 
Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section
SAMPLE

sample.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)
#=> ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", "Section 15\nAnother section\n"]

Answer 4

我认为最简单的事情是：

str = "Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section"

str[/^Section 1.+/m] # => "Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n\nSection 15\nAnother section"

如果您要删除Section标题中的部分，请以同样的方式开始，然后利用Enumerable的slice_before：

str = "Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section"

str[/^Section 1.+/m].split("\n").slice_before(/^Section \d+/m).map{ |a| a.join("\n") }
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n",
#     "Section 15\nAnother section"]

slice_before文档说：

为每个chunked元素创建一个枚举器。块的开头由模式和块定义。

根据标题将字符串拆分为多个部分

4 个答案: