方法宽度和高度机械化

时间:2012-02-25 20:02:24

标签: ruby-on-rails ruby screen-scraping mechanize

我正在使用Mechanize来抓取图片网址,然后我正在查看http://mechanize.rubyforge.org/Mechanize/Page/Image.html以了解宽度和高度的图片。

我在控制台写道:

url = "http://www.bbc.co.uk/"
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact

我得到了结果:

["1", "84", "432", "432", "432", "432", "432", "432", "432", "304", "144", "144", "144", "144", "144", "144", "432", "432", "432", "432", "432", "432", "432", "336", "62", "62", "62", "62", "84", "1", "0"]

这个结果对我来说很好,我得到图像的宽度。

然而,对于其他网页我没有例如你可以查看这个网页:

url = "http://www.glamourum.com" #check also with https://www.birchbox.com/
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact

我得到了一个结果:

=> []

数组为空:O或https://www.birchbox.com/我得到一个数组:

=> ["1", "1", "1", "1", "1"]

为什么在某些网站上会出现这种情况,而在其他网站上却不会出现这种情况?

此问题的解决方案是什么?

1 个答案:

答案 0 :(得分:1)

Mechanize无法获取图像。它只能返回HTML中img标记所反映的大小,而很多网站都不包含该大小。