从<a> tag by using Mechanize and Nokogiri

时间:2015-10-20 12:35:26

标签: ruby-on-rails ruby web-scraping nokogiri mechanize

I have this HTML:

<div id="main">
    <li>
        <h2>
            <a href="https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427">S.J.Res.427</a>
        </h2>
    </li>
    <li>
        ....
    </li>
</div>

I want to extract the href value of the <a> tag.

Using Mechanize and Nokogiri I did this:

activity_list = member.search('#main li')
activity_list.each do |link| 
    activity_link = link.at("h2 a[href]")
end

but I got TypeError: no implicit conversion of nil into String

What's wrong?

2 个答案:

答案 0 :(得分:0)

您正在寻找#attr方法:

html = Nokogiri::HTML('<div id="main"><li><h2>
  <a href="https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427">S.J.Res.427</a>
</h2></li></div>')
html.search('#main li').each do |link|
  #                         ⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓
  puts link.at("h2 a[href]").attr('href')
end
#⇒ https://www.congress.gov/bill/99th-congress/senate-joint-resolution/427

答案 1 :(得分:0)

我写得像:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
    <div id="main">
      <li>
        <h2>
          <a href="foo">S.J.Res.427</a>
        </h2>
      </li>
      <li>
        <h2>
          <a href="bar">S.J.Res.427</a>
        </h2>
      </li>
    </div>
EOT

activity_list = doc.search('#main li')
activity_list.each do |link| 
  activity_link = link.at("h2 a[href]") 
  activity_link['href'] # => "foo", "bar"
end

当您指向某个节点时,您可以使用[]来访问参数值。