我正在使用Mechanize来抓取图片网址,然后我正在查看http://mechanize.rubyforge.org/Mechanize/Page/Image.html以了解宽度和高度的图片。
我在控制台写道:
url = "http://www.bbc.co.uk/"
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact
我得到了结果:
["1", "84", "432", "432", "432", "432", "432", "432", "432", "304", "144", "144", "144", "144", "144", "144", "432", "432", "432", "432", "432", "432", "432", "336", "62", "62", "62", "62", "84", "1", "0"]
这个结果对我来说很好,我得到图像的宽度。
然而,对于其他网页我没有例如你可以查看这个网页:
url = "http://www.glamourum.com" #check also with https://www.birchbox.com/
page = Mechanize.new.get(url)
images_url = page.images.map{|img| img.width}.compact
我得到了一个结果:
=> []
数组为空:O或https://www.birchbox.com/我得到一个数组:
=> ["1", "1", "1", "1", "1"]
为什么在某些网站上会出现这种情况,而在其他网站上却不会出现这种情况?
此问题的解决方案是什么?
答案 0 :(得分:1)
Mechanize无法获取图像。它只能返回HTML中img
标记所反映的大小,而很多网站都不包含该大小。