在使用Net :: HTTP :: Pipeline下载之前检查标题

时间:2013-07-26 20:36:57

标签: ruby net-http pipelining

我正在尝试解析图片网址列表,并在实际提交下载之前获取一些基本信息。

  1. 那里有图像(用response.code解决?)
  2. 我是否已经拥有该图片(想要查看类型和尺寸?)
  3. 我的脚本每天会检查一个大型列表(大约1300行),每行有30-40个图像URL。我的@photo_urls变量允许我跟踪已经下载的内容。我真的希望以后能够将其用作哈希(而不是我的示例代码中的数组)以便稍后进行交互并进行实际下载。

    现在我的问题(除了是一个Ruby新手)是Net::HTTP::Pipeline只接受一个Net :: HTTPRequest对象的数组。 net-http-pipeline的文档表明响应对象将以与进入的相应请求对象相同的顺序返回。问题是我无法将请求与除该顺序之外的响应相关联。但是,我不知道如何获得一个区块内的相对序号位置。我假设我可以只有一个计数器变量但是如何通过顺序位置访问哈希?

              Net::HTTP.start uri.host do |http|
                # Init HTTP requests hash
                requests = {}
                photo_urls.each do |photo_url|          
                  # make sure we don't process the same image again.
                  hashed = Digest::SHA1.hexdigest(photo_url)         
                  next if @photo_urls.include? hashed
                  @photo_urls << hashed
                  # change user agent and store in hash
                  my_uri = URI.parse(photo_url)
                  request = Net::HTTP::Head.new(my_uri.path)
                  request.initialize_http_header({"User-Agent" => "My Downloader"})
                  requests[hashed] = request
                end
                # process requests (send array of values - ie. requests) in a pipeline.
                http.pipeline requests.values do |response|
                  if response.code=="200"
                      # anyway to reference the hash here so I can decide whether
                      # I want to do anything later?
                  end
                end                
              end
    

    最后,如果有更简单的方法,请随时提供任何建议。

    谢谢!

1 个答案:

答案 0 :(得分:1)

请求数组而不是哈希,并在响应进入时弹出请求:

Net::HTTP.start uri.host do |http|
  # Init HTTP requests array
  requests = []
  photo_urls.each do |photo_url|          
    # make sure we don't process the same image again.
    hashed = Digest::SHA1.hexdigest(photo_url)         
    next if @photo_urls.include? hashed
    @photo_urls << hashed

    # change user agent and store in hash
    my_uri = URI.parse(photo_url)
    request = Net::HTTP::Head.new(my_uri.path)
    request.initialize_http_header({"User-Agent" => "My Downloader"})
    requests << request
  end

  # process requests (send array of values - ie. requests) in a pipeline.
  http.pipeline requests.dup do |response|
    request = requests.shift

    if response.code=="200"
      # Do whatever checking with request
    end
  end                
end