Question

我需要一个能够获取网页的方式/库，如下例所示：

result = http_client.get('/some_page.html') do |response|
  if response.content_type == 'text/html' and response.code == 200
    response.read_body #the headers are returned along with page body
  else
    #not reading the body so only the headers are returned without body
  end
end

现在，如果“text / html”页面有成功回复：

p result.code #>200
p result.content_type #>text/html
p result.body #><DOCTYPE html...

如果是非“text / html”页面或非200页面：

p result.code #>404
p result.content_type #>text/html
p result.body #>nil

这一切都必须在对Web服务器的一个请求中完成。发出HTTP HEAD请求以检查标头然后发送HTTP GET请求以获取正文是不可接受的，因为它会导致2个请求。

什么宝石/图书馆提供了这样的可能性？

更新

我找到了一个挖掘net / http库的解决方案：

client.request_get(uri.request_uri) do |res|
  if res.content_type == 'text/html'
    res.read_body
  else
    res.instance_eval {@body_exist = false}
  end
end

Answer 1

我找到了一个挖掘net / http库的解决方案：

client.request_get(uri.request_uri) do |res|
  if res.content_type == 'text/html'
    res.read_body
  else
    res.instance_eval {@body_exist = false}
  end
end

Answer 2

也许HTTP HEAD会返回您想要的内容。

应该支持HEAD，因为此链接建议

http://rubydoc.info/gems/httpclient/2.1.5.2/HTTPClient

在使用Ruby获取正文之前，如何检查Content-type标头？

2 个答案: