Question

我正在使用Mechanize来抓取图片网址，然后我正在查看http://mechanize.rubyforge.org/Mechanize/Page/Image.html以了解宽度和高度的图片。

我在控制台写道：

url = "http://www.bbc.co.uk/"
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact

我得到了结果：

["1", "84", "432", "432", "432", "432", "432", "432", "432", "304", "144", "144", "144", "144", "144", "144", "432", "432", "432", "432", "432", "432", "432", "336", "62", "62", "62", "62", "84", "1", "0"]

这个结果对我来说很好，我得到图像的宽度。

然而，对于其他网页我没有例如你可以查看这个网页：

url = "http://www.glamourum.com" #check also with https://www.birchbox.com/
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact

我得到了一个结果：

=> []

数组为空：O或https://www.birchbox.com/我得到一个数组：

=> ["1", "1", "1", "1", "1"]

为什么在某些网站上会出现这种情况，而在其他网站上却不会出现这种情况？

此问题的解决方案是什么？

Answer 1

Mechanize无法获取图像。它只能返回HTML中img标记所反映的大小，而很多网站都不包含该大小。

方法宽度和高度机械化

1 个答案: