使用Ruby,我正在尝试解析一些文档,其中我需要拆分文本块,每个文本都有一个标题,后面跟着一段未知的文本,然后将它们推送到一个数组中;
SECTION 1. A HEADING
Some undetermined length of text,
which can be multiple lines and paragraphs.
SECTION 2. ANOTHER HEADING
Another big block of text.
应该成为
["SECTION 1. A HEADING
Some undetermined length of text,
which can be multiple lines and paragraphs.",
"SECTION 2. ANOTHER HEADING
Another big block of text."]
我可以使用string.split(/\n\n\n/)
,但我想要更具体的内容,因为我无法保证每个部分后面都会有两个空白行。多一点试验让我想到了这一点;
string.split(/(?:^|\n)(SECTION.+\n)/).each do |s|
sections << s
end
但是我必须再次处理输出以获得我需要的东西。
有没有办法在不必多次通过的情况下完成这项工作?
感谢您的帮助。
答案 0 :(得分:2)
您可以将String#scan用于多行模式正则表达式并使用正面预测:
text = <<ENDTEXT
SECTION 1. A HEADING
Some undetermined length of text,
which can be multiple lines and paragraphs.
SECTION 2. ANOTHER HEADING
Another big block of text.
ENDTEXT
header = /^SECTION\s+\d+\./
sections = text.scan(/(?m)#{header}.*?(?=#{header}|\Z)/)
puts sections.join("\n---\n")
# =>
SECTION 1. A HEADING
Some undetermined length of text,
which can be multiple lines and paragraphs.
---
SECTION 2. ANOTHER HEADING
Another big block of text.
答案 1 :(得分:1)
String#scan将为您提供所需的数组:
string.scan /^SECTION(?:(?!SECTION).)*/m