如何在每次迭代后而不是在每个文件之后将此循环写入CSV?

时间:2016-10-23 08:51:58

标签: ruby csv

我有以下方法:

  csvs = Dir["#{@dir_name}/#{@state}/*.csv"]

  csvs.each do |csv|
    city = csv.split(/[\/]|.csv-updated|.csv/).last
    new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
    CSV.open(new_csv, "a+", write_headers: true, headers: ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]) do |new_csv_row|
      CSV.foreach(csv, headers: true) do |row|
          website = row['Website']
          begin
            page = YPCrawler::PageParser.new website
            links = page.compile_all_links(website)
            emails = page.compile_all_emails(links)
            new_csv_row << (row << emails.join(","))
          rescue
            next
          end
      end
    end
  end

虽然它在处理的每一行上都没有写入新的CSV,但只有在处理完整个CSV文件后才会写入,而不是旧CSV文件中的每一行。 。我假设它处理旧的CSV文件并将结果存储在内存中,然后当该CSV文件完成时,它只是将整个内存从内存转储到文件中。我不是特别喜欢这个,因为CSV文件有不同的长度,我不想因为处理这么多文件而耗尽内存。

我最初有CSV.open(new_csv)CSV.foreach(csv),但我遇到的问题是,在每一行之后它都会写一个标题行,这不是我想要的。

我只想在文件顶部写一次标题行,然后适当添加行。

最好的方法是什么?

1 个答案:

答案 0 :(得分:1)

我认为你可以明确地写出标题。这是基于我到目前为止对我们的评论的理解

headers = ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]
set_headers = true

csvs.each do |csv|
  city = csv.split(/[\/]|.csv-updated|.csv/).last
  new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
  CSV.open(new_csv, "a+") do |new_csv_row|
    new_csv_row << headers if set_headers
    set_headers = false
    CSV.foreach(csv, headers: true) do |row|
      website = row['Website']
      begin
        page = YPCrawler::PageParser.new website
        links = page.compile_all_links(website)
        emails = page.compile_all_emails(links)
        new_csv_row << (row << emails.join(","))
      rescue
        next
      end
    end
  end
end