我正在发布一个具体问题。原始问题位于:https://stackoverflow.com/questions/23480242/nokogiri-problems-parsing-html?noredirect=1#comment36006044_23480242
我正在重新发布,因为这个爬虫是一个不同的,我认为修复很简单,即使我无法弄明白。我的抓取工具代码是:
module NoMoreRackCrawler
class << self
def noko_doc
Nokogiri::HTML(open('http://www.nomorerack.com/'))
end
def update
doc = self.noko_doc
doc.css('.deal').each do |deal|
title = deal.css('.display').css('p').children.text
deal_price = deal.css('.display').css('.pricing').css('ins').children.first.text.gsub("$","").to_f
retail_price = deal.css('.display').css('.pricing').css('del').children.first.text.gsub("$","").gsub(" Retail","").to_f
image_url = deal.css('img').first.attr('src')
url = "http://www.nomorerack.com#{deal.css('.image a').attribute('href').value}"
deal = Deal.where( title: title, source: "No More Rack" ).first_or_create
deal.category = "Deal Sites"
deal.deal_price = deal_price
deal.retail_price = retail_price
deal.image_url = image_url
deal.url = url
deal.save
end
end
module Home
def self.update
doc = Nokogiri::HTML(open("http://www.nomorerack.com/daily_deals/category/home"))
NoMoreRackCrawler::update_for_doc( doc, "Home" )
end
end
end
我收到第二个网址(http://www.nomorerack.com/daily_deals/category/home)阻止的错误。我的错误信息是:
SyntaxError: /webservices/crawler/app/crawlers/no_more_rack_crawler.rb:33: syntax error, unexpected end-of-input, expecting keyword_end
/webservices/crawler/lib/tasks/crawlers.rake:37:in `block (2 levels) in <top (required)>'
我的结局在那里,我甚至尝试删除并添加一个结束,看看是否修复它。任何想法与我可能愚蠢的语法错误?感谢您的帮助和建议。