当aws-sdk S3下载调用#rewind on IO.pipe writer

时间:2016-10-22 21:59:03

标签: ruby pipe aws-sdk

考虑以下功能。它在从AWS S3下载的大型CSV文件上运行,无需消耗不必要的内存或磁盘。 Ruby是2.3.1,aws-sdk是2.6.12。

require 'aws-sdk'
require 'csv'

def s3_reader(region, bucket, file)
  bucket = Aws::S3::Resource.new(region: region).bucket(bucket)
  reader, writer = IO.pipe
  t = Thread.new do
    bucket.object(file).get(response_target: writer)
    writer.close
  end
  t.abort_on_exception = true

  CSV.new(reader, headers: true).each_with_index do |_, n|
    print '.' if (n % 1000).zero?
  end
end

问题是,当S3 Ruby API完成后,它希望对它收到的IO对象有所帮助。因此爆炸:

./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:103:in `rewind': Illegal seek (Errno::ESPIPE)
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:103:in `signal_done'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:116:in `signal_error'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:90:in `rescue in transmit'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:68:in `transmit'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:42:in `call'

我能看到解决这个问题的唯一方法就是:

  writer.instance_eval('undef :rewind')

这是我所做的最丑陋的解决方案之一。

我在这里遗漏了什么吗?有没有合理的方法来避免这个问题? (我在aws-sdk源中没有看到任何内容)?

或者,如果a)Ruby的IO.pipe没有公开#rewind和/或b)aws-sdk不假设只是因为#rewind存在,它不应该期望能够调用它? / p>

0 个答案:

没有答案