保存大文本文件而不使用太多内存

时间:2013-07-01 12:32:07

标签: ruby-on-rails ruby amazon-s3

我有一个创建KML文件的模型。我将该KML视为一个字符串,然后将其转发给邮件程序,然后将其传递给它:

def write_kml(coords3d, time)
  kml = String.new
    kml << header
    coords3d.each do |coords|
      coordinates = String.new
      coords.each do |coord|
        lat = coord[0].to_f
        lng = coord[1].to_f
        coordinates << "#{lng}" + "," "#{lat}" + ",0 "
        kml <<  polygon(coordinates)
      end
      end

    kml <<  footer
  kml

end

这在这里使用:

  CsvMailer.kml_send(kml,time, mode, email).deliver

邮件程序:

  def kml_send(kml, time, mode, email)
    @time = (time / 60).to_i
    @mode = mode
    gen_time = Time.now
    file_name = gen_time.strftime('%Y-%m-%d %H:%M:%S') + " #{@mode.to_s}" + " #{@time.to_s}(mins)"
    attachments[file_name + '(KML).kml'] = { mime_type: 'text/kml', content: kml}
    mail to: email, subject: ' KML Filem'
  end

占用大量内存。其中一些文件非常大(200MB),所以在Heroku上,它们占用的空间太大。

我有一些使用S3的想法,但我需要先创建这个文件,所以它仍然会使用内存。我可以不使用内存直接写入S3吗?

1 个答案:

答案 0 :(得分:1)

您可以使用s3分段上传来执行此操作,因为它们不需要您事先知道文件大小。

部件的大小必须至少为5MB,因此最简单的方法是将数据写入内存缓冲区,并在每次超过5MB时将部件上传到s3。上传时限制为10000个部分,因此如果您的文件大小为&gt; 50GB然后你需要提前知道,这样你就可以把零件做得更大。

使用雾库,看起来有点像

def upload_chunk connection, upload_id, chunk, index
    md5 = Base64.encode64(Digest::MD5.digest(chunk)).strip
    connection.upload_part('bucket', 'a_key', upload_id, chunk_index, chunk, 'Content-MD5' => md5 )
end


connection = Fog::Storage::AWS.new(:aws_access_key_id => '...', :region => '...', :aws_secret_access_key => '...'
upload_id = connection.initiate_multipart_upload('bucket', 'a_key').body['UploadId']
chunk_index = 1

kml = String.new
kml << header
coords3d.each do |coords|
  #append to kml
  if kml.bytesize > 5 * 1024 * 1024
    upload_chunk connection, upload_id, kml, chunk_index
    chunk_index += 1
    kml = ''
  end
end
upload_chunk connection, upload_id, kml, chunk_index
#when you've uploaded all the chunks
connection.complete_multipart_upload('bucket', 'a_key', upload_id)

如果您创建了一个上传器类来包装缓冲区并将所有s3逻辑卡在那里,您可能会想出更整洁的东西。然后你的kml代码不必知道它是否有一个实际的字符串或一个定期刷新到s3的字符串