我正在抓一些网页内容,我收到以下错误,
scrape.rb:27:in block in <main>': undefined method
text&#39; for nil:NilClass(NoMethodError)
运行我的ruby任务时,由于css中不包含任何内容。
有没有办法检查CSS是否未定义,以便它不会停止爬行?我的代码不起作用:(
products.each do |product|
web = Nokogiri::HTML(open(product))
counter = products.index(product)
if web.at_css('.entry-title').text != undefined
puts "CSS content is not undefined"
else
puts "Error"
end
答案 0 :(得分:3)
您可以在调用文本
之前 IF 对象结果result = web.at_css('.entry-title')
if result
puts "CSS content is not undefined"
puts result.text
else
puts "Error"
end
答案 1 :(得分:0)
我同意at_css&amp; IF是测试类存在的最佳解决方案。这是我掀起的一个例子..
user_agents = ["Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0",
"Mozilla/5.0 (compatible; Konqueror/3; Linux)",
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0",
"Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401",
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)",
"Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586",
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6",
"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0",
"Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1",
"Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",
"Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36"]
user_agent = user_agents.sample
good_2_go = "https://gomovies.to/genre/action/1"
my_bad = "https://gomovies.to/genre/action/100"
crawls = []
crawls.push(good_2_go, my_bad)
crawls.each do |crawl|
doc = Nokogiri::HTML(open(crawl, 'User-Agent' => user_agent).read, nil, 'utf-8')
entries = doc.at_css('.ml-item')
if entries
puts crawl
puts "Found entries class, proceeding with scrape.."
else
puts crawl
puts "Could not find base class for entries"
end
end
这将是STDOUT ......
=> https://gomovies.to/genre/action/1
Found entries class, proceeding with scrape..
https://gomovies.to/genre/action/100
Could not find base class for entries