Question

我目前正在开发一个小应用程序，允许某人输入字符串，将较大的文件分成较小的文件。

我无法将文件拆分为较新的文件。这是我的CSV方法的代码：

require 'csv'

new_list = []
old_list = []

def import_csv(file)
  puts "Method begins"
  CSV.foreach("public_schools_list.csv", :encoding => 'windows-1251:utf-8', :headers => true) do |row|
    if row["company"].downcase.include?("academy" || "lutheran" || "jewish" || "private" || "christian")
      CSV.open("new_list.csv", "ab") do |n|
        n << row

        puts "First Row"

        new_list << n
      end
    else
      CSV.open("old_list.csv", "ab") do |o|
       o << row

       puts "Second Row"

       old_list << o
      end
    end
  end
end

puts "New Csv: #{new_list.count}"
puts "Old Csv: #{old_list.count}"

我只是想检查这段代码，看它是否正在拆分文件。我不确定其中一些是否正确。目前CSV列表中只有四个项目。我使用count方法检查它们是否进入了正确的文件。

我错过了哪些代码可以将主文件分成两部分？

这是我的控制器：

include CSVUpload

def import
  csv_separate(params[:file].tempfile)
  redirect_to root_url
end

然后这是我正在使用的模块：

require 'csv'
module CSVUpload

    def csv_separate(file)
        new_list_counter = 0
        old_list_counter = 0

        # puts "Method begins"
        CSV.open("new_list.csv", "ab") do |new_list|
        CSV.open("old_list.csv", "ab") do |old_list|  
            CSV.foreach(file, :encoding => 'windows-1251:utf-8', :headers => true) do |row|
              if row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]
                new_list << row

                new_list_counter += 1
              else
                old_list << row

                old_list_counter += 1
              end
            end
          end
        end
      end
    end

然后是表格：

<div>
  <h3>Import a CSV File</h3>
   <%= form_tag({action: "import"}, multipart: true) do %>
   <%= file_field_tag("file") %>
   <%= submit_tag "Import CSV" %>
 <% end %>
</div>

我希望有所帮助。谢谢！

Answer 1

不要打开然后关闭您读取的每一行的输出文件。这非常低效，浪费了CPU时间。在CSV.foreach循环之外打开它们，然后有条件地写入它们。

另外，不要在内存中聚合文件的行，以便计算它们。相反，增加一个计数器。

此外，include?不能以这种方式运作：

include?("academy" || "lutheran" || "jewish" || "private" || "christian")

The documentation说：

  str.include? other_str   -> true or false

------------------------------------------------------------------------------

Returns true if str contains the given string orcharacter.

  "hello".include? "lo"   #=> true
  "hello".include? "ol"   #=> false
  "hello".include? ?h     #=> true

请注意，它需要一个字符或字符串。使用或者字符串列表只会产生第一个字符串：

"academy" || "lutheran" || "jewish" || "private" || "christian" # => "academy"

因此，只有“学院”才能进行测试。

这是未经测试但看起来不错：

require 'csv'


def import_csv(file)
  new_list_counter = 0
  old_list_counter = 0

  puts "Method begins"

  CSV.open("new_list.csv", "ab") do |new_list|
    CSV.open("old_list.csv", "ab") do |old_list|

      CSV.foreach("public_schools_list.csv", :encoding => 'windows-1251:utf-8', :headers => true) do |row|

        if row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]
          new_list << row

          puts "First Row"

          new_list_counter += 1

        else
          old_list << row

          puts "Second Row"

          old_list_counter += 1
        end

      end
    end

  end

  puts "New CSV: #{ new_list_counter }"
  puts "Old CSV: #{ old_list_counter }"

end

与可能的循环算法相比，

row["company"][/\b(?:academy|lutheran|jewish|private|christian)\b/i]非常有效。使用正则表达式模式不是万能的灵丹妙药。使用错误的正则表达式模式实际上可能会减慢您的程序，有时甚至是大幅度。虽然使用正确，但它们可以减少您的代码，并且在正确的情况下，如果写得正确，可以提供显着的速度提升。因此，在盲目地将它们引入代码之前，请先使用基准测试。

解析CSV文件。分成更多文件

1 个答案: