我正在制作一个网络刮刀,所以我可以学习如何。当我在终端中运行它时,我收到一条错误消息,指出:
scraper.rb:23:' item_container':未定义的方法' css' for nil:NilClass(NoMethodError)
这是我在scraper.rb中的代码
require 'HTTParty'
require 'Nokogiri'
class Scraper
attr_accessor :parse_page
def initialize
doc = HTTParty.get("http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3")
@parse_page ||= Nokogiri::HTML(doc) #memoized @parse_page so it only gets assigned once.
end
def get_names
names = item_container.css(".product-name").css("p").children.map { |name| name.text }.compact
end
def get_prices
prices = item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
end
private
def item_container
parse_page.css(".grid-item-info")
end
scraper = Scraper.new
names = scraper.get_names
prices = scraper.get_prices
(0...prices.size).each do |index|
puts "- - - index: #{index + 1} - - -"
puts "Name: #{names[index]} | Price: #{prices[index]}"
end
end
有谁能告诉我为什么会收到此错误?我该如何解决?提前谢谢。
答案 0 :(得分:0)
此问题被标记为[ruby-on-rails]。如果它是Rails项目的一部分,那么你只需要在你的Gemfile中放入httparty和nokogiri,就没有必要了。
这对我来说是一个Rails项目(lib / scraper.rb):
class Scraper
attr_accessor :parse_page
def initialize
doc = HTTParty.get("http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3")
@parse_page ||= Nokogiri::HTML(doc) #memoized @parse_page so it only gets assigned once.
end
def get_names
names = item_container.css(".product-name").css("p").children.map { |name| name.text }.compact
end
def get_prices
prices = item_container.css(".product-price").css("span.local").children.map { |price| price.text }.compact
end
private
def item_container
parse_page.css(".grid-item-info")
end
end
答案 1 :(得分:0)
调解此事:
require 'httparty'
require 'nokogiri'
class Scraper
attr_accessor :parse_page
attr_reader :url
def initialize(url)
@url ||= url
@parse_page ||= Nokogiri::HTML(HTTParty.get(url))
end
def names_and_prices
@parse_page.search('div.product-name').map{ |shoe|
shoe_parent = shoe.parent
name = shoe_parent.at('p.product-display-name').text
product_prices = shoe_parent.at('div.prices')
override_price = product_prices.at('span.overridden').text
price = product_prices.at('span.local').text
{
name: name,
price: price,
override_price: override_price
}
}
end
end
scraper = Scraper.new('http://store.nike.com/us/en_us/pw/mens-nikeid-lifestyle-shoes/1k9Z7puZoneZoi3')
scraper.names_and_prices.each_with_index do |shoe, index|
puts "#{index + 1}: Name: #{shoe[:name]} | Price: #{shoe[:price]} | Override price: #{shoe[:override_price]}"
end
这导致输出如下:
1: Name: Nike Sock Dart iD | Price: $170 | Override price:
2: Name: Nike Air Max 1 Ultra Flyknit iD | Price: $200 | Override price:
3: Name: Nike Air Max 1 Premium iD | Price: $175 | Override price:
4: Name: Nike Air Max 90 Premium iD | Price: $175 | Override price:
5: Name: Nike Air Force 1 High Premium iD | Price: $175 | Override price:
6: Name: Nike Air Force 1 Mid Premium iD | Price: $170 | Override price:
...
scraper.names_and_prices
返回一个哈希数组,如下所示:
[
[0] {
:name => "Nike Sock Dart iD",
:price => "$170",
:override_price => ""
},
[1] {
:name => "Nike Air Max 1 Ultra Flyknit iD",
:price => "$200",
:override_price => ""
}
]
抓取时,您需要深入了解HTML以找到标记中的最佳标记,以便您快速找到所需内容。 div.product-name
实际上比我想要的更深一级,因此shoe.parent
将一个级别备份到包含所需信息的父节点。结果是代码能够干净地检索每只鞋子的信息。使用.grid-item-info
导航导致至少一个误报以及内部选择器的一组参与nils。