为了加载少量数据,我一直在使用rake任务将重要数据从CSV转换为Rails:
desc "Import users."
task :import_users => :environment do
File.open("users.txt", "r").each do |line|
name, age, profession = line.strip.split("\t")
u = User.new(:name => name, :age => age, :profession => profession)
u.save
end
end
对于较大的文件(大约50,000条记录),这非常慢。是否有更快的方式导入数据?
答案 0 :(得分:4)
您可能需要查看activerecord-import并查看this similar thread。
答案 1 :(得分:1)
没有额外的库(我同意使用AR扩展的批量导入应该更快)(尽管AR:Extension跳过模型验证)你可以添加一点并发性并利用多核机器
# Returns the number of processor for Linux, OS X or Windows.
def number_of_processors
if RUBY_PLATFORM =~ /linux/
return `cat /proc/cpuinfo | grep processor | wc -l`.to_i
elsif RUBY_PLATFORM =~ /darwin/
return `sysctl -n hw.logicalcpu`.to_i
elsif RUBY_PLATFORM =~ /win32/
# this works for windows 2000 or greater
require 'win32ole'
wmi = WIN32OLE.connect("winmgmts://")
wmi.ExecQuery("select * from Win32_ComputerSystem").each do |system|
begin
processors = system.NumberOfLogicalProcessors
rescue
processors = 0
end
return [system.NumberOfProcessors, processors].max
end
end
raise "can't determine 'number_of_processors' for '#{RUBY_PLATFORM}'"
end
desc "Import users."
task :fork_import_users => :environment do
procs = number_of_processors
lines = IO.readlines('user.txt')
nb_lines = lines.size
slices = nb_lines / procs
procs.times do
subset = lines.slice!(0..slices)
fork do
subset.each do |line|
name, age, profession = line.strip.split("\t")
u = User.new(:name => name, :age => age, :profession => profession)
u.save
end
end
end
Process.waitall
end
在我的机器上有2个内核和我得到的fork版本
real 1m41.974s
user 1m32.629s
sys 0m7.318s
与你的版本:
real 2m56.401s
user 1m21.953s
sys 0m7.529s
答案 2 :(得分:0)
您应该尝试FasterCSV。这对我来说非常快速且容易使用。