Question

我是正则表达式的新手。我有一个字符串代码如下。我希望在所有div标记结束后获取文本。

<div class="bbcode_container">
  <div class="bbcode_quote">
    <div class="quote_container">
      <div class="bbcode_quote_container">
      </div>
      <div class="bbcode_postedby">
        <img border="0" src="http://www.webketoan.vn/forum/images/misc/quote_icon.png" alt="Click here to enlarge" onclick="window.open(this.src)" style="max-width: 700px; cursor: pointer;" title="Click here to enlarge"> Nguyên văn bởi <strong>namphong13</strong>
        <a rel="nofollow" href="http://www.webketoan.vn/forum/f94/ket-qua-thi-cong-chuc-thue-126218-post842693.html#post842693"><img border="0" src="http://www.webketoan.vn/forum/images/buttons/viewpost-right.png" class="inlineimg" alt="Click here to enlarge" onclick="window.open(this.src)" style="max-width: 700px; cursor: pointer;" title="Click here to enlarge"></a>
      </div>
      <div class="message">Can you help me?<br>
      </div>
    </div>
  </div>
</div>

我该怎么做？

Answer 1

您想查看是否有文字

感谢您的支持

在你的页面中？

然后你的正则表达式会是这样的：

match = html_string[/.+Thanks for support/]

如果match变量不是nil，那么您的html_string变量中有该文字

如果你想在最后一次关闭div之后收到所有文字，那么你可以：

html_string =~ /.*\<\/div\>\n([a-zA-Z\s]*)$/

puts $1

Answer 2

您应该使用像Nokogiri这样的HTML解析器。

page = Nokogiri::HTML(my_file)
# remove all the div tags
page.search('div').remove
string = page.text

Answer 3

使用下面的代码删除（不区分大小写）字符串“＆lt; / div＆gt;”之前出现的每个字符：

input = 'a</div>b</DIV>c'
output = input.gsub(/.*<\/div>/i,'')    # => "c"

使用正则表达式获取标记后的文本

3 个答案: