Question

我正在尝试下载一个大文件，然后使用Ruby将该文件发布到REST端点。该文件可能非常大，即可以存储在内存中，甚至可以存储在磁盘上的临时文件中。我一直在尝试使用Net :: HTTP，但只要他们做我正在尝试做的事情，我就可以使用任何其他库（rest-client等）。

这是我试过的：

<script src="jquery-1.12.0.min.js"></script>
<link rel="stylesheet" href="http://maxcdn.bootstrapcdn.com/bootstrap/3.3.5/css/bootstrap.min.css">

我想要发生的是source_response.read_body返回一个流，然后我可以以块的形式传递给target_request。

Answer 1

回答我自己的问题：这是我的解决方案。请注意，为了使这项工作，我需要修补Net :: HTTP，以便我可以访问套接字，以便从响应对象手动读取块。如果你有更好的解决方案，我仍然希望看到它。

require 'net/http'
require 'excon'

# provide access to the actual socket
class Net::HTTPResponse
  attr_reader :socket
end

source_uri = URI("https://example.org/very_large_file")
target_uri = URI("https://example2.org/rest/resource")

Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
  request = Net::HTTP::Get.new source_uri

  http.request request do |response|
    len = response.content_length
    p "reading #{len} bytes..."
    read_bytes = 0
    chunk = ''

    chunker = lambda do
      begin
        if read_bytes + Excon::CHUNK_SIZE < len
          chunk = response.socket.read(Excon::CHUNK_SIZE).to_s
          read_bytes += chunk.size
        else
          chunk = response.socket.read(len - read_bytes)
          read_bytes += chunk.size
        end
      rescue EOFError
        # ignore eof
      end
      p "read #{read_bytes} bytes"
      chunk
    end

    Excon.ssl_verify_peer = false
    Excon.post(target_uri.to_s, :request_block => chunker)

  end
end

Answer 2

通过使用excon和rest-client gem，您应该能够流式传输数据并将其上传到多个部分。

遗憾的是，我找不到使用rest-client流式传输数据的方法，或使用带有excon的多部分/表单数据来传输数据，因此您必须将这两者结合起来。

require 'excon'
require 'rest-client'

streamer = lambda do |chunk, remaining_bytes, total_bytes|
  puts "Remaining: #{remaining_bytes.to_f / total_bytes}%"
  puts RestClient.post('http://posttestserver.com/post.php', :param1 => chunk)
end

Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)

在搞砸了之后，我可以得到以下代码在某种程度上工作（它看起来并不一致，有时它会发送所有内容，有时它不会。我相信它可能是因为它在完成之前结束了http post请求。

require 'excon'
require 'uri'
require 'net/http'

class Producer
  def initialize
   @mutex = Mutex.new
   @body = ''
  end

  def read(size, out=nil)
    length = nil

    @mutex.synchronize {
      length = @body.slice!(0,size)
    }

    return nil if length.nil? || length.empty?
    out << length if out

    length
  end

  def produce(str)
    @mutex.synchronize {
      @body << str
    }
  end
end

@stream = Producer.new

uri = URI("yourpostaddresshere")
conn = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new uri.request_uri, {'Transfer-Encoding' => 'chunked', 'content-type' => 'text/plain'}
request.body_stream = @stream

Thread.new {
  streamer = lambda do |chunk, remaining_bytes, total_bytes|
    @stream.produce(chunk) 
  end

  Excon.get('http://textfiles.com/computers/ami-chts.txt', :response_block => streamer)
}

conn.start do |http|
  http.request(request)
end

致Roman，我确实稍微修改了它，因为HTTP.start需要两个参数（Ruby Net：HTTP更改）。

Answer 3

没有异步I / O（在Ruby中很难），唯一的方法是通过FIFO管道使用两个线程。一个要提取，另一个要上传。

FIFO作为环形缓冲区工作。你找回了一个读者和一个作家。无论何时写入编写器，读取器都会获取数据，读取器将始终阻塞，直到有可用数据为止。 FIFO由真实的文件句柄支持，因此I / O就像一个文件（不像StringIO这样的“虚假”流。

这样的事情：

require 'net/http'

def download_and_upload(source_url, dest_url)
  rd, wr = IO.pipe
  begin
    source_uri = URI.parse(source_url)

    Thread.start do
      begin
        Net::HTTP.start(source_uri.host, source_uri.port, use_ssl: source_uri.scheme == 'https') do |http|
          req = Net::HTTP::Get.new(source_uri.request_uri)
          http.request(req) do |resp|
            resp.read_body do |chunk|
              wr.write(chunk)
              wr.flush
            end
          end
        end
      rescue IOError
        # Usually because the writer was closed
      ensure
        wr.close rescue nil
      end
    end

    dest_uri = URI.parse(dest_url)

    Net::HTTP.start(dest_uri.host, dest_uri.port, use_ssl: dest_uri.scheme == 'https') do |http|
      req = Net::HTTP::Post.new(dest_uri.request_uri)
      req.body_stream = rd
      http.request(req)
    end
  ensure
    rd.close rescue nil
    wr.close rescue nil
  end
end

我没有测试过这个，因为我目前没有端点，但这就是它的原理。

请注意，我遗漏了错误处理。如果下载程序线程失败，您将需要捕获错误并将其发送到上传程序线程。（如果上传程序失败，下载将停止，因为写入管道将关闭。）

使用Ruby将HTTP GET的响应主体流式传输到HTTP POST

3 个答案: