将CSV流从Ruby上传到S3

时间:2016-02-11 20:33:47

标签: ruby-on-rails ruby csv heroku amazon-s3

我正在处理潜在的巨大CSV文件,我想从我的Rails应用程序导出,因为它在Heroku上运行,我的想法是在生成它们时将这些CSV文件直接流式传输到S3。

现在,我有一个问题,因为Aws::S3期望一个文件以便能够执行上传,而在我的Rails应用程序中我想做类似的事情:

S3.bucket('my-bucket').object('my-csv') << %w(this is one line)

我怎样才能做到这一点?

3 个答案:

答案 0 :(得分:3)

s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket.object("#{FOLDER_NAME}/#{file_name}.csv")
file_csv = CSV.generate do |csv|
    csv << ActionLog.column_names
    ActionLog.all.each do |action_log|
      csv << action_log.attributes.values
    end
  end
  obj.put body: file_csv

file_csv = CSV.generate是在Ruby中创建一串CSV数据。创建这个CSV字符串后,我们使用存储桶将路径

放入S3
#{FOLDER_NAME}/#{file_name}.csv

在我的代码中,我将所有数据导出到ActionLog模型。

答案 1 :(得分:3)

您可以使用s3分段上传,该分段上传可以通过将大对象拆分为多个块来进行。 https://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html

分段上传需要更复杂的编码,但是aws-sdk-ruby V3支持upload_stream方法,该方法似乎在内部执行分段上传,并且非常易于使用。也许是这个用例的确切解决方案。 https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Object.html#upload_stream-instance_method

client = Aws::S3::Client.new(
  region: 'ap-northeast-1',
  credentials: your_credential
)

obj = Aws::S3::Object.new('your-bucket-here', 'path-to-output', client: client)

require "csv"
obj.upload_stream do |write_stream|
  [
    %w(this is first line),
    %w(this is second line),
    %w(this is third line),
  ].each do |line|
    write_stream << line.to_csv
  end
end
this,is,first,line
this,is,second,line
this,is,third,line

upload_stream块的参数通常可以用作IO对象,这使您可以像生成文件或其他IO对象那样链接和包装CSV生成:

obj.upload_stream do |write_stream|
  CSV(write_stream) do |csv|
    [
      %w(this is first line),
      %w(this is second line),
      %w(this is third line),
    ].each do |line|
      csv << line
    end
  end
end

例如,您可以在生成和上传CSV时压缩它,使用临时文件来减少内存占用:

obj.upload_stream(tempfile: true) do |write_stream|
  Zlib::GzipWriter.wrap(write_stream) do |gzw|
    CSV(gzw) do |csv|
      [
        %w(this is first line),
        %w(this is second line),
        %w(this is third line),
      ].each do |line|
        csv << line
      end
    end
  end
end

答案 2 :(得分:0)

我会看一下http://docs.aws.amazon.com/AWSRubySDK/latest/AWS/S3/S3Object.html#write-instance_method,因为这可能是您正在寻找的内容。

修改 http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadObjSingleOpRuby.html可能更相关,因为第一个链接指向ruby aws-sdk v1

require 'aws-sdk'

s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket('bucket-name').object('key')

# string data
obj.put(body: 'Hello World!')

# IO object
File.open('source', 'rb') do |file|
  obj.put(body: file)
end