我有一个用Ruby编写的简单抓取工具,它应该抓取特定网站并将数据保存到CSV文件中。当我尝试运行脚本时,我不断收到未定义的方法错误:
boxers.rb:29:in `<main>': undefined method `text' for nil:NilClass (NoMethodError)
这是我尝试运行的脚本代码:
#!/usr/bin/env ruby
require 'csv'
require 'mechanize'
agent = Mechanize.new{ |agent| agent.history.max_size=0 }
agent.user_agent = 'Mozilla/5.0'
base = "http://siteurl.com/"
division = ARGV[0]
search_url = "http://siteurl.com/ratings.php?sex=M&division=#{division}&pageID="
path='//*[@id="mainContent"]/table/tr[position()>2]'
boxers = CSV.open("csv/file.csv","w")
url = search_url+"1"
begin
page = agent.get(url)
rescue
print " -> error, retrying\n"
retry
end
// propably the line that causes error
a = page.parser.xpath('//a[@title="last page"]').first.text
a.gsub!("[","")
a.gsub!("]","")
last = a.to_i
(1..last).each do |page|
url = search_url+page.to_s
begin
page = agent.get(url)
rescue
print " -> error, retrying\n"
retry
end
page.parser.xpath(path).each do |tr|
row = [division]
tr.xpath("td").each_with_index do |td,j|
case j
when 0,11
next
when 2
text = td.text.strip
a = td.xpath("a").first
href = base+a.attributes["href"].value.strip
human_id = href.split("=")[1].split("&")[0]
cat = href.split("=")[2]
row += [human_id, cat, text, href]
when 4
text = td.text.strip
record = text.split("-")
wins = record[0]
wko = wins.split("(")[1].split(")")[0] rescue 0
wins = wins.split("(")[0]
losses = record[1]
lko = losses.split("(")[1].split(")")[0] rescue 0
losses = losses.split("(")[0]
draws = record[2]
row += [wins, wko, losses, lko, draws, text]
when 5
last6 = []
td.xpath("table/tr/td").each do |td2|
outcome = td2.attributes["class"].value.strip rescue nil
last6 += [outcome]
end
last6 = last6.to_s.gsub("[","{").gsub("]","}")
row += [last6]
when 9
div = td.xpath("div").first
flag = div.attributes["class"].value.strip rescue nil
title = div.attributes["title"].value.strip rescue nil
row += [flag,title]
else
text = td.text.strip
row += [text]
end
end
if (row.size>2)
boxers << row
end
end
boxers.flush
end
boxers.close
答案 0 :(得分:1)
您在没有价值的内容或.text
上致电nil
。
根据错误信息,它在第29行,这让我相信这一行是罪魁祸首:
a = page.parser.xpath('//a[@title="last page"]').first.text
看起来当xpath(...)
在任何元素上都不匹配时,它会返回一个空的枚举。因此first
无法找到任何内容,因此返回nil。
解决方案是检查nil
。在Ruby中检查nil有很多指导和资源,例如this question。