使用Xpath和Nokogiri选择特定的div元素?

时间:2013-12-15 18:27:46

标签: xpath sinatra nokogiri

我对解析相对较新,并希望获得更多练习。我想解析以下网址:http://www.goodreads.com/quotes/tag/hard-work

我想抓住标记为“勤奋”的所有引号。这就是站点代码分解为:

<div class="content">
<div id="siteheader" class="uitext">
<div class="mainContentContainer ">
<div class="mainContent">
<div id="premiumAdTop">
<div class="mainContentFloat">
<div id="flashContainer"> </div>
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a>
</div>

现在我的代码是:

require "rubygems"
require "open-uri"
require "nokogiri" 

@page = Nokogiri::HTML(open("http://goodreads.com/quotes"))
@div = @page.xpath("html/body/div[1]")

但结果并没有给我我想要的输出。

我想我应该调用方法eachcollect,但我只是不知道如何到达我想要的节点,我相信它包含在这里:

<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a>
</div>

有人能指出我正确的方向吗?我需要多长时间才能进入div课程以获得我想要的东西?

1 个答案:

答案 0 :(得分:0)

您可以使用XPath:

//div[@class = 'quoteText' and following-sibling::div[1][@class = 'quoteFooter' and .//a[@href and normalize-space() =  'hard-work']]]

选择所有div元素,其类别为quoteText,后跟一个div元素,其中quoteFooter类包含hard-work的链接。< / p>