RegEx代码在理论上有效,但在代码运行时则无效

时间:2014-11-25 12:41:10

标签: ruby regex

我正在尝试在Ruby中使用此RegEx搜索:<div class="ms3">(\n.*?)+<,但是只要我到达最后一个字符“&lt;”它完全停止工作。我已经在Rubular中测试了它,并且RegEx工作得非常好,我使用rubymine来编写我的代码,但我也使用Powershell测试它,它得到了相同的结果。没有错误消息。当我运行<div class="ms3">(\n.*?)+时,它打印<div class="ms3">这正是我正在寻找的,但只要我添加“&lt;”它没有任何结果。

我的代码:

#!/usr/bin/ruby
# encoding: utf-8

File.open('ms3.txt', 'w') do |fo|
  fo.puts File.foreach('input.txt').grep(/<div class="ms3">(\n.*?)+/)
end

我正在搜索的一些内容:

  <div class="ms3">
    <span xml:lang="zxx"><span xml:lang="zxx">Still the tone of the remainder of the chapter is bleak. The</span> <span class="See_In_Glossary" xml:lang="zxx">DAY OF THE <span class="Name_Of_God" xml:lang="zxx">LORD</span></span> <span xml:lang="zxx">holds no hope for deliverance (5.16–18); the futility of offering sacrifices unmatched by common justice is once more underlined, and exile seems certain (5.21–27).</span></span>
  </div>

  <div class="Paragraph">
    <span class="Verse_Number" id="idAMO_5_1" xml:lang="zxx">1</span><span class="scrText">Listen, people of Israel, to this funeral song which I sing over you:</span>
  </div>

  <div class="Stanza_Break"></div>

我需要做的完整RegEx是<div class="ms3">(\n.*?)+<\/div>它拿起第一部分而没有别的

1 个答案:

答案 0 :(得分:1)

问题始于使用File.foreach('input.txt')将结果剪切成行。这意味着模式分别与每一行匹配,因此没有一行匹配模式(根据定义,没有一行在其中间有\n

你应该有更好的运气阅读整个文本块,并在其上使用match

File.read('input.txt').match(/<div class="ms3">(\n.*?)+<\/div>/)
# => #<MatchData "<div class=\"ms3\">\n    <span xml:lang=\"zxx\">
# => <span xml:lang=\"zxx\">Still the tone of the remainder of the chapter is bleak. The</span> 
# => <span class=\"See_In_Glossary\" xml:lang=\"zxx\">DAY OF THE 
# => <span class=\"Name_Of_God\" xml:lang=\"zxx\">LORD</span></span> 
# => <span xml:lang=\"zxx\">holds no hope for deliverance (5.16–18); 
# => the futility of offering sacrifices unmatched by common justice is once more 
# => underlined, and exile seems certain (5.21–27).</span></span>\n  </div>" 1:"\n  ">