根据标题将字符串拆分为多个部分

时间:2014-01-17 16:43:27

标签: ruby regex

我有一个字符串,其中有几个部分名为“Section 1”......“Section 20”,并希望将此字符串拆分为这些单独的部分。这是一个例子:

Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section

我想把它分成

["Section 1\n Text within this section, may contain the word section.\n\nAnd go in for quite a bit.",
"Section 15 Another section"]

我感觉很蠢,因为没有把它弄好。我的尝试总能抓住一切。现在我有

/(Section.+\d+$[\s\S]+)/

但我不能从中得到贪婪。

4 个答案:

答案 0 :(得分:0)

在我看来,分割文本的Regexp如下:

/(?:\n\n|^)Section/

所以代码是:

str = "
Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section
"

newstr = str.split( /(?:\n\n|^)Section/, -1 )[1..-1].map {|l| "Section " + l.strip }
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.", "Section 15\nAnother section"] 

答案 1 :(得分:0)

您可以使用此正则表达式:

(?m)(Section\s*\d+)(.*?\1)$

Live demo

答案 2 :(得分:0)

您可以将scan与此正则表达式/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m

一起使用
string.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)

Section\s\d+\n将匹配任何Section标头

除了另一个标题标题外,

(?:.(?!Section\s\d+\n))*将匹配其他任何内容。

m也会让点匹配换行符

sample = <<SAMPLE 
Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section
SAMPLE

sample.scan(/Section\s\d+\n(?:.(?!Section\s\d+\n))*/m)
#=> ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n", "Section 15\nAnother section\n"]

答案 3 :(得分:0)

我认为最简单的事情是:

str = "Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section"

str[/^Section 1.+/m] # => "Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n\nSection 15\nAnother section"

如果您要删除Section标题中的部分,请以同样的方式开始,然后利用Enumerable的slice_before

str = "Stuff we don't care about

Section 1
Text within this section, may contain the word section.

And go on for quite a bit.

Section 15
Another section"

str[/^Section 1.+/m].split("\n").slice_before(/^Section \d+/m).map{ |a| a.join("\n") }
# => ["Section 1\nText within this section, may contain the word section.\n\nAnd go on for quite a bit.\n",
#     "Section 15\nAnother section"]

slice_before文档说:

  

为每个chunked元素创建一个枚举器。块的开头由模式和块定义。