我正在使用nokogiri从中文网站(淘宝网)获取图片:
url = "http://item.taobao.com/item.htm?spm=a1z10.1.w137644-1960500098.43.d7Uwpx&id=36246359192"
doc = Nokogiri::HTML(open(url) )
puts doc.css("title").text
puts doc.css("img")[0]['src']
puts doc.css("img#J_ImgBooth")[0]['src']
我可以获得标题和doc.css("img")[0]['src']
,但我无法获得img#J_ImgBooth
。问题是什么?是以某种方式阻止了吗?
答案 0 :(得分:1)
看看html源代码,没有src但img的数据-src属性#J_ImgBooth
<img id="J_ImgBooth" data-src="http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_!!2-item_pic.png_310x310.jpg" data-hasZoom="700" />
使用
doc.css("img#J_ImgBooth")[0]['data-src']
会好的。
答案 1 :(得分:1)
这对我有用:
doc.at_css("#J_ImgBooth")["data-src"]
您可以检查属性名称是data-src
:
#(Element:0x3ffb5d3d9df0 {
name = "img",
attributes = [
#(Attr:0x3ffb5d3d9b84 { name = "id", value = "J_ImgBooth" }),
#(Attr:0x3ffb5d3d9b70 {
name = "data-src",
value = "http://img03.taobaocdn.com/bao/uploaded/i3/18513032853503639/T1z1ojXdNhXXXXXXXX_!!2-item_pic.png_310x310.jpg"
}),
#(Attr:0x3ffb5d3d9b5c { name = "data-haszoom", value = "700" })]
})