如果我有以下HTML结构
<section class="main-gallery homeowner-rating content-block">
<!--content-->
</section>
<section class="homeowner-rating content-block">
<!--content-->
</section>
<section class="homeowner-rating content-block">
<!--content-->
</section>
<section class="homeowner-rating content-block">
<!--content-->
</section>
如何选择除第一个以外的所有homeowner-rating.content-block
类?
为了给出一些上下文我已经使用Nokogiri设置了一个简单的屏幕抓取,但是它试图从第一节类中获取信息,返回空白结果。
def get_testimonials
url = 'http://www.ratedpeople.com/profile/lcc-building-and-construction'
doc = Nokogiri::HTML.parse(open url)
testimonial_section = doc.css('.homeowner-rating.content-block').each do |t|
title = t.css('h4').text.strip
comments = t.css('q').text.strip
author = t.css('cite').text.strip
end
end
感谢任何帮助。
答案 0 :(得分:4)
使用您当前的设置,有多种方式:
.homeowner-rating+.homeowner-rating
{
color: red;
}
.homeowner-rating:not(.main-gallery)
{
color: red;
}
演示:http://jsfiddle.net/PKEv5/1/
仅当主图库是节点的第一个孩子时才会起作用:
.homeowner-rating:not(:first-child)
{
color: red;
}
答案 1 :(得分:1)
使用Nokogiri很容易:
require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse(<<EOT)
<section class="main-gallery homeowner-rating content-block">
<p>1</p>
</section>
<section class="homeowner-rating content-block">
<p>2</p>
</section>
<section class="homeowner-rating content-block">
<p>3</p>
</section>
<section class="homeowner-rating content-block">
<p>4</p>
</section>
EOT
doc.css('.homeowner-rating')[1..-1].map(&:to_html)
# => ["<section class=\"homeowner-rating content-block\">\n <p>2</p>\n</section>",
# "<section class=\"homeowner-rating content-block\">\n <p>3</p>\n</section>",
# "<section class=\"homeowner-rating content-block\">\n <p>4</p>\n</section>"]
Nokogiri的search
,css
和xpath
方法返回NodeSets,其行为类似于Array,因此您可以将结果切片以抓取块。