我正试图从http://expo.getbootstrap.com/
HTML是这样的:
<div class="col-span-4">
<p>
<a class="thumbnail" target="_blank" href="https://www.getsentry.com/">
<img src="/screenshots/sentry.jpg">
</a>
</p>
</div>
我的Nokogiri代码是:
url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
title=site.css("h4 a").text
href = site.css("a.thumbnail")[0]['href']
end
目标很简单,获取href
,<img>
代码的href
和网站的<title>
,但会不断报告:
undefined method [] for nil:NilClass
在该行:
href = site.css("a.thumbnail")[0]['href']
这真让我抓狂,因为我在这里写的代码实际上是在另一种情况下工作。
答案 0 :(得分:2)
我会做类似的事情:
require 'nokogiri'
require 'open-uri'
require 'pp'
doc = Nokogiri::HTML(open('http://expo.getbootstrap.com/'))
thumbnails = doc.search('a.thumbnail').map{ |thumbnail|
{
href: thumbnail['href'],
src: thumbnail.at('img')['src'],
title: thumbnail.parent.parent.at('h4 a').text
}
}
pp thumbnails
其中,跑完后有:
# => [
{
:href => "https://www.getsentry.com/",
:src => "/screenshots/sentry.jpg",
:title => "Sentry"
},
{
:href => "http://laravel.com",
:src => "/screenshots/laravel.jpg",
:title => "Laravel"
},
{
:href => "http://gruntjs.com",
:src => "/screenshots/gruntjs.jpg",
:title => "Grunt"
},
{
:href => "http://labs.bittorrent.com",
:src => "/screenshots/bittorrent-labs.jpg",
:title => "BitTorrent Labs"
},
{
:href => "https://www.easybring.com/en",
:src => "/screenshots/easybring.jpg",
:title => "Easybring"
},
{
:href => "http://developers.kippt.com/",
:src => "/screenshots/kippt-developers.jpg",
:title => "Kippt Developers"
},
{
:href => "http://www.learndot.com/",
:src => "/screenshots/learndot.jpg",
:title => "Learndot"
},
{
:href=>"http://getflywheel.com/",
:src=>"/screenshots/flywheel.jpg",
:title=>"Flywheel"
}
]
答案 1 :(得分:1)
您没有考虑到并非所有.col-span-4
div都包含缩略图的事实。这应该有效:
url = "http://expo.getbootstrap.com/"
doc = Nokogiri::HTML(open(url))
puts doc.css("title").text
doc.css(".col-span-4").each do |site|
title = site.css("h4 a").text
thumbnail = site.css("a.thumbnail")
next if thumbnail.empty?
href = thumbnail[0]['href']
end