我有以下方法:
csvs = Dir["#{@dir_name}/#{@state}/*.csv"]
csvs.each do |csv|
city = csv.split(/[\/]|.csv-updated|.csv/).last
new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
CSV.open(new_csv, "a+", write_headers: true, headers: ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]) do |new_csv_row|
CSV.foreach(csv, headers: true) do |row|
website = row['Website']
begin
page = YPCrawler::PageParser.new website
links = page.compile_all_links(website)
emails = page.compile_all_emails(links)
new_csv_row << (row << emails.join(","))
rescue
next
end
end
end
end
虽然它在处理的每一行上都没有写入新的CSV,但只有在处理完整个CSV文件后才会写入,而不是旧CSV文件中的每一行。 。我假设它处理旧的CSV文件并将结果存储在内存中,然后当该CSV文件完成时,它只是将整个内存从内存转储到文件中。我不是特别喜欢这个,因为CSV文件有不同的长度,我不想因为处理这么多文件而耗尽内存。
我最初有CSV.open(new_csv)
和CSV.foreach(csv)
,但我遇到的问题是,在每一行之后它都会写一个标题行,这不是我想要的。
我只想在文件顶部写一次标题行,然后适当添加行。
最好的方法是什么?
答案 0 :(得分:1)
我认为你可以明确地写出标题。这是基于我到目前为止对我们的评论的理解
headers = ["Company_Name","Website","Street_Address", "City", "State", "Zip", "Phone","Email1", "Email2", "Email3", "Email4", "Email5"]
set_headers = true
csvs.each do |csv|
city = csv.split(/[\/]|.csv-updated|.csv/).last
new_csv = "#{@dir_name}/#{@state}/emails/#{city}-with-emails.csv"
CSV.open(new_csv, "a+") do |new_csv_row|
new_csv_row << headers if set_headers
set_headers = false
CSV.foreach(csv, headers: true) do |row|
website = row['Website']
begin
page = YPCrawler::PageParser.new website
links = page.compile_all_links(website)
emails = page.compile_all_emails(links)
new_csv_row << (row << emails.join(","))
rescue
next
end
end
end
end