我对解析相对较新,并希望获得更多练习。我想解析以下网址:http://www.goodreads.com/quotes/tag/hard-work。
我想抓住标记为“勤奋”的所有引号。这就是站点代码分解为:
<div class="content">
<div id="siteheader" class="uitext">
<div class="mainContentContainer ">
<div class="mainContent">
<div id="premiumAdTop">
<div class="mainContentFloat">
<div id="flashContainer"> </div>
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a>
</div>
现在我的代码是:
require "rubygems"
require "open-uri"
require "nokogiri"
@page = Nokogiri::HTML(open("http://goodreads.com/quotes"))
@div = @page.xpath("html/body/div[1]")
但结果并没有给我我想要的输出。
我想我应该调用方法each
和collect
,但我只是不知道如何到达我想要的节点,我相信它包含在这里:
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
<a href="/author/show/3916262.Babe_Ruth">Babe Ruth</a>
</div>
有人能指出我正确的方向吗?我需要多长时间才能进入div课程以获得我想要的东西?
答案 0 :(得分:0)
您可以使用XPath:
//div[@class = 'quoteText' and following-sibling::div[1][@class = 'quoteFooter' and .//a[@href and normalize-space() = 'hard-work']]]
选择所有div
元素,其类别为quoteText
,后跟一个div
元素,其中quoteFooter
类包含hard-work
的链接。< / p>