Question

我的导入速度很慢，处理13000行CSV文件需要2.5个小时。

我想知道为什么随着文件大小的增加，我的内存使用量会不断增加吗？我正在使用foreach，因此我认为应在每次批量导入的大小（一次500行）的情况下最大化内存使用率。我正在尝试以最小的内存使用量快速处理2万行的CSV大型文件。

有更好的方法吗？

Ruby：2.6.1

Rails 5.2.2

宝石Activerecord导入

代码

require 'benchmark'
require 'csv'

def print_memory_usage
  memory_before = `ps -o rss= -p #{Process.pid}`.to_i
  yield
  memory_after = `ps -o rss= -p #{Process.pid}`.to_i

  puts "Memory: #{((memory_after - memory_before) / 1024.0).round(2)} MB"
end

def print_time_spent
  time = Benchmark.realtime do
    yield
  end

  puts "Time: #{time.round(2)}"
end

nol = 0
ac = Profile.count
lc = Course.count
fc = Book.count

print_memory_usage do
  print_time_spent do
    i = 0
    profiles = []

    CSV.foreach( File.join(Rails.root, 'lib','sample_data', 'sample.csv'), headers: true) do |row|
      nol+=1
      i+=1

      a = Profile.new(user_id: 1)
      l = a.build_course
      f = a.build_book

      l.name = 'TEST-small sample.csv'
      l.course_name  = row[0]
      a.grade = row[1]
      f.book_title = row[3]

      profiles << a

      if i == 500
        puts "Import Stats"
        print_memory_usage do
          print_time_spent do
            Profile.import profiles, recursive: true
          end
        end

        i=0
        profiles=[]
        puts "--Record Count--"
        puts "Profile: #{ac}/#{Profile.count}"
        puts "Course: #{lc}/#{Course.count}"
        puts "Book: #{fc}/#{Book.count}"
      end
    end
    Profile.import profile, recursive: true
    puts "-- TOTAL Memory and Speed --"
  end
end

puts "--Imported #{nol} Records--"

输出

Import Stats
Time: 79.3
Memory: 42.51 MB
--Record Count--
Profile: 1823/2323
Course: 1723/2223
Book: 1723/2223
Import Stats
Time: 79.65
Memory: 27.88 MB
--Record Count--
Profile: 1823/2823
Course: 1723/2723
Book: 1723/2723
Import Stats
Time: 80.16
Memory: 9.91 MB
--Record Count--
Profile: 1823/3323
Course: 1723/3223
Book: 1723/3223
Import Stats
Time: 74.83
Memory: 4.19 MB
--Record Count--
Profile: 1823/3823
Course: 1723/3723
Book: 1723/3723

-- TOTAL Memory and Speed --
Time: 350.56
Memory: 102.43 MB
--Imported 2241 Records--

CSV.foreach内存使用量不断增加

代码

输出

0 个答案: