为什么我用Ruby和Nokogiri得到这个未定义的方法错误?

时间:2016-08-24 15:20:16

标签: ruby-on-rails ruby nokogiri

我正在制作一个网络刮刀,所以我可以学习如何。当我在终端中运行它时,我收到一条错误消息,指出:

scraper.rb:23:' item_container':未定义的方法' css' for nil:NilClass(NoMethodError)

这是我在scraper.rb中的代码

require 'HTTParty'
require 'Nokogiri'

class Scraper

  attr_accessor :parse_page

  def initialize
    doc = HTTParty.get("http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3")
    @parse_page ||= Nokogiri::HTML(doc) #memoized @parse_page so it only gets assigned once.
  end

  def get_names
    names = item_container.css(".product-name").css("p").children.map { |name| name.text }.compact
  end

  def get_prices
    prices = item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
  end

  private
  def item_container
    parse_page.css(".grid-item-info")
  end

  scraper = Scraper.new
  names = scraper.get_names
  prices = scraper.get_prices

  (0...prices.size).each do |index|
    puts "- - - index: #{index + 1} - - -"
    puts "Name: #{names[index]} | Price: #{prices[index]}"
  end
end

有谁能告诉我为什么会收到此错误?我该如何解决?提前谢谢。

2 个答案:

答案 0 :(得分:0)

此问题被标记为[ruby-on-rails]。如果它是Rails项目的一部分,那么你只需要在你的Gemfile中放入httparty和nokogiri,就没有必要了。

这对我来说是一个Rails项目(lib / scraper.rb):

class Scraper

  attr_accessor :parse_page

  def initialize
    doc = HTTParty.get("http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3")
    @parse_page ||= Nokogiri::HTML(doc) #memoized @parse_page so it only gets assigned once.
  end

  def get_names
    names = item_container.css(".product-name").css("p").children.map { |name| name.text }.compact
  end

  def get_prices
    prices = item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
  end

  private

  def item_container
    parse_page.css(".grid-item-info")
  end

end

答案 1 :(得分:0)

调解此事:

require 'httparty'
require 'nokogiri'

class Scraper

  attr_accessor :parse_page
  attr_reader :url

  def initialize(url)
    @url ||= url
    @parse_page ||= Nokogiri::HTML(HTTParty.get(url))
  end

  def names_and_prices
    @parse_page.search('div.product-name').map{ |shoe|
      shoe_parent = shoe.parent
      name = shoe_parent.at('p.product-display-name').text

      product_prices = shoe_parent.at('div.prices')
      override_price = product_prices.at('span.overridden').text
      price = product_prices.at('span.local').text

      {
        name: name,
        price: price,
        override_price: override_price
      }
    }
  end

end

scraper = Scraper.new('http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3')

scraper.names_and_prices.each_with_index do |shoe, index|
  puts "#{index + 1}: Name: #{shoe[:name]} | Price: #{shoe[:price]} | Override price: #{shoe[:override_price]}"
end

这导致输出如下:

1: Name: Nike Sock Dart iD | Price: $170 | Override price:
2: Name: Nike Air Max 1 Ultra Flyknit iD | Price: $200 | Override price:
3: Name: Nike Air Max 1 Premium iD | Price: $175 | Override price:
4: Name: Nike Air Max 90 Premium iD | Price: $175 | Override price:
5: Name: Nike Air Force 1 High Premium iD | Price: $175 | Override price:
6: Name: Nike Air Force 1 Mid Premium iD | Price: $170 | Override price:
...

scraper.names_and_prices返回一个哈希数组,如下所示:

[
  [0] {
    :name           => "Nike Sock Dart iD",
    :price          => "$170",
    :override_price => ""
  },
  [1] {
    :name           => "Nike Air Max 1 Ultra Flyknit iD",
    :price          => "$200",
    :override_price => ""
  }
]

抓取时,您需要深入了解HTML以找到标记中的最佳标记,以便您快速找到所需内容。 div.product-name实际上比我想要的更深一级,因此shoe.parent将一个级别备份到包含所需信息的父节点。结果是代码能够干净地检索每只鞋子的信息。使用.grid-item-info导航导致至少一个误报以及内部选择器的一组参与nils。