考虑以下功能。它在从AWS S3下载的大型CSV文件上运行,无需消耗不必要的内存或磁盘。 Ruby是2.3.1,aws-sdk是2.6.12。
require 'aws-sdk'
require 'csv'
def s3_reader(region, bucket, file)
bucket = Aws::S3::Resource.new(region: region).bucket(bucket)
reader, writer = IO.pipe
t = Thread.new do
bucket.object(file).get(response_target: writer)
writer.close
end
t.abort_on_exception = true
CSV.new(reader, headers: true).each_with_index do |_, n|
print '.' if (n % 1000).zero?
end
end
问题是,当S3 Ruby API完成后,它希望对它收到的IO对象有所帮助。因此爆炸:
./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:103:in `rewind': Illegal seek (Errno::ESPIPE)
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:103:in `signal_done'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/http/response.rb:116:in `signal_error'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:90:in `rescue in transmit'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:68:in `transmit'
from ./vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.6.12/lib/seahorse/client/net_http/handler.rb:42:in `call'
我能看到解决这个问题的唯一方法就是:
writer.instance_eval('undef :rewind')
这是我所做的最丑陋的解决方案之一。
我在这里遗漏了什么吗?有没有合理的方法来避免这个问题? (我在aws-sdk源中没有看到任何内容)?
或者,如果a)Ruby的IO.pipe没有公开#rewind和/或b)aws-sdk不假设只是因为#rewind存在,它不应该期望能够调用它? / p>