尝试解析dom时无法获得价值

时间:2014-08-12 17:38:30

标签: ruby nokogiri mechanize

<div class="_5xu4">
  <header class="_5tkh">
    <h3 class="_52jd _52ja _52jg _5f43">CEO in 
       <strong><a href="*****">Magento</a></strong>
    </h3>
  </header>
</div>

def find_specialty(id)
    agent = Mechanize.new
    agent.cookie_jar.load("sessions/"+self.id.to_s)
    agent.get("****")
    profession = []
    agent.page.search("h3").each do |link|
      profession.push(link.text)
    end
    return profession
  end

我想要CEO in

1 个答案:

答案 0 :(得分:1)

require 'nokogiri'

def get_profession(html)
  doc = Nokogiri::HTML(html)
  doc.xpath('//h3/text()').to_s.strip
end

html_str = <<-__HERE__
  <div class="_5xu4">
    <header class="_5tkh">
      <h3 class="_52jd _52ja _52jg _5f43">CEO in 
         <strong><a href="*****">Magento</a></strong>
      </h3>
    </header>
  </div>
__HERE__

puts get_profession(html_str) # => "CEO in"