Question

我正在使用机械化来抓取一个效果很好的网站，但是因为你无法从一个链接告诉它链接到哪个文件，例如http://somesite.com/images.php?get=123 是否可以只下载标题？

我问这个是因为我想根据文件类型决定是否要下载它。此外，它还有助于在下载时决定文件名。

它不必使用机械化，但是有没有Rails方式这样做？

Answer 1

此？ http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682

response = nil
Net::HTTP.start('some.www.server', 80) {|http|
    response = http.head('/index.html')
}
p response['content-type']

Answer 2

你可以使用路边

ruby-1.8.7-p174 > require 'rubygems'
 => true 
ruby-1.8.7-p174 > require 'curb'
 => true  
ruby-1.8.7-p174 > c = Curl::Easy.http_head('https://encrypted.google.com/images/logos/ssl_logo_lg.gif'){|easy| easy.follow_location = true}
ruby-1.8.7-p174 > c.perform
 => true
 => #<Curl::Easy https://encrypted.google.com/images/logos/ssl_logo>
ruby-1.8.7-p174 > c.content_type
 => "image/gif"

仅下载文件的标题

2 个答案: