如何从刮擦中访问nokogiri哈希数组中的某个值

时间:2013-11-15 19:56:05

标签: ruby-on-rails ruby nokogiri

这是我用于刮擦的特定代码:

require 'singleton'
require 'open-uri'

class ProgramHighlights < ActiveRecord::Base

  self.table_name = 'program_highlights'
  include ActiveRecord::Singleton

  def fetch
    url = "http://kboo.fm/"
    doc = Nokogiri::HTML(open(url))
    titles = []
    program_title = doc.css(".title a").each do |title|
      titles.push(title)
    end
  end
end

当访问titles数组并通过它时,我的输出是:

(Element:0x5b40910 {
  name = "a",
  attributes = [
    #(Attr:0x5b8c310 {
      name = "href",
      value = "/content/thedeathsofothersthefateofciviliansinamericaswars"
      }),
    #(Attr:0x5b8c306 {
      name = "title",
      value = "The Deaths of Others: The Fate of Civilians in America's Wars"
    })],
   children = [
    #(Text "The Deaths of Others: The Fate of Civilians in America's Wars")]
  })

我特别希望获得“价值” 但是,执行以下操作并不能解决问题:

titles[0].value
titles[0]["value"]
titles[0][value]

我不知道为什么我无法访问它,因为它看似是哈希。任何方向的指针都与此相关?我无法以简单的JSON格式获取数据,因此需要刮擦。

1 个答案:

答案 0 :(得分:1)

要获取节点的属性值,可以使用['attribute_name']。例如:

require 'nokogiri'
html = %Q{
    <html>
        <a href="/content/thedeathsofothersthefateofciviliansinamericaswars" title="The Deaths of Others: The Fate of Civilians in America's Wars">
    </html>
}
doc = Nokogiri::HTML(html)
node = doc.at_css('a')
puts node['href']
#=> /content/thedeathsofothersthefateofciviliansinamericaswars
puts node['title']
#=> The Deaths of Others: The Fate of Civilians in America's Wars

假设您想要每个链接的title属性值,您可以执行以下操作:

program_title = doc.css(".title a").each do |link|
  titles.push(link['title'])
end