Question

我正在尝试在Ruby中使用此RegEx搜索：<div class="ms3">(\n.*?)+<，但是只要我到达最后一个字符“＆lt;”它完全停止工作。我已经在Rubular中测试了它，并且RegEx工作得非常好，我使用rubymine来编写我的代码，但我也使用Powershell测试它，它得到了相同的结果。没有错误消息。当我运行<div class="ms3">(\n.*?)+时，它打印<div class="ms3">这正是我正在寻找的，但只要我添加“＆lt;”它没有任何结果。

我的代码：

#!/usr/bin/ruby
# encoding: utf-8

File.open('ms3.txt', 'w') do |fo|
  fo.puts File.foreach('input.txt').grep(/<div class="ms3">(\n.*?)+/)
end

我正在搜索的一些内容：

  <div class="ms3">
    <span xml:lang="zxx"><span xml:lang="zxx">Still the tone of the remainder of the chapter is bleak. The</span> <span class="See_In_Glossary" xml:lang="zxx">DAY OF THE <span class="Name_Of_God" xml:lang="zxx">LORD</span></span> <span xml:lang="zxx">holds no hope for deliverance (5.16–18); the futility of offering sacrifices unmatched by common justice is once more underlined, and exile seems certain (5.21–27).</span></span>
  </div>

  <div class="Paragraph">
    <span class="Verse_Number" id="idAMO_5_1" xml:lang="zxx">1</span><span class="scrText">Listen, people of Israel, to this funeral song which I sing over you:</span>
  </div>

  <div class="Stanza_Break"></div>

我需要做的完整RegEx是<div class="ms3">(\n.*?)+<\/div>它拿起第一部分而没有别的

Answer 1

问题始于使用File.foreach('input.txt')将结果剪切成行。这意味着模式分别与每一行匹配，因此没有一行匹配模式（根据定义，没有一行在其中间有\n。

你应该有更好的运气阅读整个文本块，并在其上使用match：

File.read('input.txt').match(/<div class="ms3">(\n.*?)+<\/div>/)
# => #<MatchData "<div class=\"ms3\">\n    <span xml:lang=\"zxx\">
# => <span xml:lang=\"zxx\">Still the tone of the remainder of the chapter is bleak. The</span> 
# => <span class=\"See_In_Glossary\" xml:lang=\"zxx\">DAY OF THE 
# => <span class=\"Name_Of_God\" xml:lang=\"zxx\">LORD</span></span> 
# => <span xml:lang=\"zxx\">holds no hope for deliverance (5.16–18); 
# => the futility of offering sacrifices unmatched by common justice is once more 
# => underlined, and exile seems certain (5.21–27).</span></span>\n  </div>" 1:"\n  ">

RegEx代码在理论上有效，但在代码运行时则无效

1 个答案: