Question

如何使用Click Here to Enter a New Password搜索包含Nokigiri::HTML的元素？

我的HTML结构如下：

<table border="0" cellpadding="20" cellspacing="0" width="100%">
  <tbody>
  <tr>
    <td class="bodyContent" valign="top">
      <div>
        <strong>Welcome to</strong>
        <h2 style="margin-top:0">OddZ</h2>
        <a href="http://mandrillapp.com/track/click.php?...">Click Here</a>
        to Enter a New Password
        <p>
          Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.
          <br>
        </p>
      </div>
    </td>
  </tr>
  </tbody>
</table>

我试过了：

doc = (Nokogiri::HTML(@inbox_emails.first.body.raw_source))

password_container = doc.search "[text()*='Click Here to Enter a New Password']"

但是没有找到结果。我试过的时候：

password_container = doc.search "[text()*='Click Here']"

我没有结果。

我想搜索完整的文字。

我发现文字" to Enter a New Password"之前有很多空格，但我没有在HTML代码中添加任何空格。

Answer 1

您要搜索的大部分文字都在a元素之外。

你能做的最好的事情可能是：

a = doc.search('a[text()="Click Here"]').find{|a| a.next.text[/to Enter a New Password/]}

Answer 2

你可以混合使用xpath和regex，但由于novogiri的xpath中没有matches，你可以按如下方式实现自己的：{/ p>

class RegexHelper
  def content_matches_regex node_set, regex_string
    ! node_set.select { |node| node.content =~ /#{regex_string}/mi }.empty?
  end

  def content_matches node_set, string
    content_matches_regex node_set, string.gsub(/\s+/, ".*?")
  end
end

search_string = "Click Here to Enter a New Password"

matched_nodes = doc.xpath "//*[content_matches(., '#{search_string}')]", RegexHelper.new

Answer 3

您可以尝试使用CSS选择器。我已将您的HTML保存为名为test.html

的文件

require 'Nokogiri'

@doc = Nokogiri::HTML(open('test.html'))

puts @result = @doc.css('p').text.gsub(/\n/,'')

它返回

Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.

关于Parsing HTML with Nokogiri

的帖子很好

Answer 4

你很亲密。以下是查找文本包含元素的方法：

doc.search('*[text()*="Click Here"]')

这会为您提供<a>标记。这是你想要的吗？如果您确实需要<a>的父元素（包含<div>），则可以像这样修改它：

doc.search('//*[text()="Click Here"]/..').text

选择包含<div>的内容，其文本为：

Welcome to
OddZ
Click Here
to Enter a New Password

Click this link to enter a new Password. This link will expire within 24 hours, so don't delay.

如何搜索特定的文本元素？

4 个答案: