我的导入速度很慢,处理13000行CSV文件需要2.5个小时。
我想知道为什么随着文件大小的增加,我的内存使用量会不断增加吗?
我正在使用foreach
,因此我认为应在每次批量导入的大小(一次500行)的情况下最大化内存使用率。我正在尝试以最小的内存使用量快速处理2万行的CSV大型文件。
有更好的方法吗?
Ruby:2.6.1
Rails 5.2.2
宝石Activerecord导入
require 'benchmark'
require 'csv'
def print_memory_usage
memory_before = `ps -o rss= -p #{Process.pid}`.to_i
yield
memory_after = `ps -o rss= -p #{Process.pid}`.to_i
puts "Memory: #{((memory_after - memory_before) / 1024.0).round(2)} MB"
end
def print_time_spent
time = Benchmark.realtime do
yield
end
puts "Time: #{time.round(2)}"
end
nol = 0
ac = Profile.count
lc = Course.count
fc = Book.count
print_memory_usage do
print_time_spent do
i = 0
profiles = []
CSV.foreach( File.join(Rails.root, 'lib','sample_data', 'sample.csv'), headers: true) do |row|
nol+=1
i+=1
a = Profile.new(user_id: 1)
l = a.build_course
f = a.build_book
l.name = 'TEST-small sample.csv'
l.course_name = row[0]
a.grade = row[1]
f.book_title = row[3]
profiles << a
if i == 500
puts "Import Stats"
print_memory_usage do
print_time_spent do
Profile.import profiles, recursive: true
end
end
i=0
profiles=[]
puts "--Record Count--"
puts "Profile: #{ac}/#{Profile.count}"
puts "Course: #{lc}/#{Course.count}"
puts "Book: #{fc}/#{Book.count}"
end
end
Profile.import profile, recursive: true
puts "-- TOTAL Memory and Speed --"
end
end
puts "--Imported #{nol} Records--"
Import Stats
Time: 79.3
Memory: 42.51 MB
--Record Count--
Profile: 1823/2323
Course: 1723/2223
Book: 1723/2223
Import Stats
Time: 79.65
Memory: 27.88 MB
--Record Count--
Profile: 1823/2823
Course: 1723/2723
Book: 1723/2723
Import Stats
Time: 80.16
Memory: 9.91 MB
--Record Count--
Profile: 1823/3323
Course: 1723/3223
Book: 1723/3223
Import Stats
Time: 74.83
Memory: 4.19 MB
--Record Count--
Profile: 1823/3823
Course: 1723/3723
Book: 1723/3723
-- TOTAL Memory and Speed --
Time: 350.56
Memory: 102.43 MB
--Imported 2241 Records--