Ruby n00b在这里。我正在两次抓取同一页面 - 但每次都以略微不同的方式 - 并将它们导出为单独的CSV文件。我想将CSV no.1的第一列和CSV no.2的第二列组合在一起,创建CSV no.3。
提取CSV码1和1的代码2件作品。但添加我尝试将两个CSV组合到第三个(在底部注释掉)返回以下错误 - 两个CSV填充正常,但第三个保持空白,脚本处于看似无限循环的状态。我知道这些线不应该在底部,但我看不出它会去哪里......
alts.rb:45:in `block in <main>': undefined local variable or method `scrapedURLs1' for main:Object (NameError)
from /Users/JammyStressford/.rvm/rubies/ruby-2.0.0-p451/lib/ruby/2.0.0/csv.rb:1266:in `open'
from alts.rb:44:in `<main>'
代码本身:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
url = "http://www.example.com/page"
page = Nokogiri::HTML(open(url))
CSV.open("results1.csv", "wb") do |csv|
page.css('img.product-card-image').each do |scrape|
product1 = scrape['alt']
page.css('a.product-card-image-link').each do |scrape|
link1 = scrape['href']
scrapedProducts1 = "#{product1}"[0..-7]
scrapedURLs1 = "{link1}"
csv << [scrapedProducts1, scrapedURLs1]
end
end
end
CSV.open("Results2.csv", "wb") do |csv|
page.css('a.product-card-image-link').each do |scrape|
link2 = scrape['href']
page.css('img.product-card-image').each do |scrape|
product2 = scrape['alt']
scrapedProducts2 = "#{product2}"[0..-7]
scrapedURLs2 = "http://www.lyst.com#{link2}"
csv << [scrapedURLs2, scrapedProducts2]
end
end
end
## Here is where I am trying to combine the two columns into a new CSV. ##
## It doesn't work. I suspect that this part should be further up... ##
# CSV.open("productResults3.csv", "wb") do |csv|
# csv << [scrapedURLs1, scrapedProducts2]
#end
puts "upload complete!"
感谢阅读。
答案 0 :(得分:0)
感谢您分享您的代码和问题。我希望我的意见有所帮助!
您的scrapedURLs1 = "{link}"
和scrapedProducts1 = "#{scrape['alt']}"[0..-7]
最后有一个 1 ,但您不能在csv << [scrapedProducts, scrapedURLs]
上调用它是你得到的错误
我建议您结合前两个步骤跳过 写入文件,但进入数组数组,然后你可以写 他们要提交。
您是否在您提供的示例代码中意识到这一点
scrapedURLs1, scrapedProducts2
会混淆错误的网址
错误的产品。这是你的意思吗?
在注释掉的代码scrapedURLs1, scrapedProducts2
中不存在,它们尚未被声明。您需要打开两个文件以使用.each do |scrapedURLs1|
读取,然后打开另一个.each do |scrapedProducts2|
,然后这些变量将存在,因为each
枚举器实例化它们。
在内部迭代中重用相同的|scrape|
变量并不是一个好主意。将名称更改为其他名称,例如|scrape2|
。它&#34;发生&#34;工作,因为你已经在第二个循环之前已经在product=scrape['alt']
中获得了你需要的东西。如果重命名第二个循环变量,可以将product=scrape['alt']
行移动到内部循环中并合并它们。例如:
# In your code example you may get many links per product.
# If that was your intent then that may be fine.
# This code should get one link per product.
CSV.open("results1.csv", "wb") do |csv|
page.css('img.product-card-image').each do |scrape|
page.css('a.product-card-image-link').each do |scrape2|
# [ product , link ]
csv << [scrape['alt'][0..-7], scrape2['href']]
# NOTE that scrape['alt'][0..-7] and scrape2['href'] are already strings
# so you don't need to use "#{ }"
end
end
end
附注:Ruby 2.0.0不需要行require "rubygems"
如果您正在使用CSV,我强烈建议您使用James Edward Gray II的faster_csv gem 。请在此处查看使用示例:https://github.com/JEG2/faster_csv/blob/master/examples/csv_writing.rb