Question

您好我正在使用mechanize和nokogiri删除网页。我正在选择一系列链接<a></a>

 html_body = Nokogiri::HTML(body)
    links = html_body.css('.L1').xpath("//table/tbody/tr/td[2]/a[1]")

然后我需要检查每个链接（<a>content</a>，而不是href）的内容是否与我的数据库中的某些内容匹配。我这样做：

       links.each do |link|
          if link = @tournament.homologation_number

如果我的情况已经实现，我需要选择我检查过的<td></td>之前的<td>，然后点击其中的链接。

<td><a href="link I want to click if condition is true"></a></td>
<td><a href="">content I check with my condition</a></td>

如何使用Mechanize和nokogiri实现这一目标？

Answer 1

我会迭代第一个td，因为它比以前的元素更容易获得以下元素（无论如何都使用css）

page.search('td[1]').each do |td|
  if td.at('+ td a').text == 'foo'
    page2 = agent.get td.at('a')[:href]
  end
end

Answer 2

首先，您必须选择所有<td></td>，后面的xpath //table/tbody/tr/td[2]/a[1]仅选择第一个<a></a>元素，因此您可以尝试类似//table/tbody/tr/td的内容，但这样取决于具体情况。

获得<td></td>数组后，您可以访问以下链接：

tds.each do |td|
  link = td.children.first             # Select the first children
  if condition_is_matched(link.html)   # Only consider the html part of the link, if matched follow the previous link
    previous_td   = td.previous
    previous_url = previous_td.children.first.href
    goto_url previous_url
  end
end

选择上一个td并单击Mechanize和Nokogiri的链接

2 个答案: