用rails以行和列格式构造csv

时间:2019-06-24 19:45:54

标签: ruby-on-rails ruby csv export-to-csv

我用ruby构建了一个小型的网络抓取应用程序,借此从网站上抓取数据,然后将其存储在csv文件中。我正在成功地抓取和存储所有内容,但是无法以“表”格式构造csv文件,因为该格式有两列和多行。我的csv文件应具有一个名称列和一个价格列,以及每个产品的名称和价格。这是我的代码:

require 'open-uri'
require 'nokogiri'
require 'httparty'
require 'byebug'
require 'csv'

    def whey_scrapper
        company = 'Body+%26+fit'
        url = "https://www.bodyenfitshop.nl/eiwittenwhey/whey-proteine/?limit=81&manufacturer=#{company}"
        unparsed_page = open(url).read
        parsed_page = Nokogiri::HTML(unparsed_page)
        product_names = parsed_page.css('div.product-primary')
        name = Array.new
        product_names.each do |product_name| 
            name << product_name.css('h2.product-name').text
        end
        product_prices = parsed_page.css('div.price-box')
        price = Array.new
        product_prices.each do |product_price|
            price << product_price.css('span.price').text
        end
        headers = ["name", "price"]
        item = [name, price]
        CSV.open('data/wheyprotein.csv', 'w', :col_sep => "\t|", :headers => true) do |csv|
            csv << headers
            item.each {|row| csv << row }
        end
        byebug
    end   
    whey_scrapper

每次迭代后我都会创建一行,但是csv文件仍然非常混乱且结构混乱。

这是我的csv文件的外观:

name	|price
-----------------
"
                            
                                Whey Perfection                                Body & fit
                            
                        "	|"
                            
                                Whey Perfection® bestseller box                                Body & fit
                            
                        "	|"
                            
                                Whey Perfection - Special Series                                Body & fit
                            
                        "	|"
                            
                                Isolaat Perfection                                Body & fit
                            
                        "	|"
                            
                                Perfect Protein                                Body & fit
                            
                        "	|"
                            
                                Whey Isolaat XP                                Body & fit
                            
                        "	|"
                            
                                Micellar Casein Perfection                                Body & fit
                            
                        "	|"
                            
                                Low Calorie Meal                                Body & fit
                            
                        "	|"
                            
                                Whey Breakfast                                Body & fit
                            
                        "	|"
                            
                                Whey Perfection - Flavour Box                                 Body & fit
                            
                        "	|"
                            
                                Protein Breakfast                                Body & fit
                            
                        "	|"
                            
                                Whey Perfection Summer Box                                Body & fit
                            
                        "	|"
                            
                                Puur Whey                                Body & fit
                            
                        "	|"
                            
                                Whey Isolaat Crispy                                Body & fit
                            
                        "	|"
                            
                                Vegan Protein voordeel                                Body & fit vegan
                            
                        "	|"
                            
                                Whey Perfection Winter Box                                Body & fit
                            
                        "	|"
                            
                                Sports Breakfast                                Body & fit
                            
                        "
€ 7,90	|€ 9,90	|€ 11,90	|€ 17,90	|€ 31,90	|€ 18,90	|€ 12,90	|€ 6,90	|€ 6,90	|€ 10,90	|€ 15,90	|€ 9,90	|€ 26,90	|€ 6,90	|€ 24,90	|€ 9,90	|€ 20,90

1 个答案:

答案 0 :(得分:1)

首先-产品名称。您正在从HTML中获取太多信息。 h2元素包含空格和span元素,在其中可能应该忽略它们。您可以这样做:

product_names.each do |product_name| 
  name << product_name.css('h2.product-name a').children[0].text.gsub(/\s{2,}/, '')
end

然后,CSV需要将每一行作为包含多个项目的数组传递。在您的情况下,应该有很多包含两个项目(产品名称和价格)的数组。为此,您可以简单地压缩两个表:

items = name.zip(price)

然后创建CSV文件:

CSV.open('data/wheyprotein.csv', 'w') do |csv|
  csv << headers
  items.each {|row| csv << row }
end

完整方法如下:

def whey_scrapper
    company = 'Body+%26+fit'
    url = "https://www.bodyenfitshop.nl/eiwittenwhey/whey-proteine/?limit=81&manufacturer=#{company}"
    unparsed_page = open(url).read
    parsed_page = Nokogiri::HTML(unparsed_page)
    product_names = parsed_page.css('div.product-primary')
    name = Array.new
    product_names.each do |product_name| 
        name << product_name.css('h2.product-name a').children[0].text.gsub(/\s{2,}/, '')
    end
    product_prices = parsed_page.css('div.price-box')
    price = Array.new
    product_prices.each do |product_price|
        price << product_price.css('span.price').text
    end
    headers = ["name", "price"]
    items = name.zip(price)
    CSV.open('data/wheyprotein.csv', 'w+') do |csv|
        csv << headers
        items.each {|row| csv << row }
    end
end