查找元素邻居

时间:2014-01-16 19:41:16

标签: html ruby html-parsing nokogiri

我的文档有以下两种格式:

<p><b>Referral Description:</b></p>
<p>
 This is the body of the referral's detailed description. 
 I want to get this text out of the document.
</p>

<table>
  <tr>
    <td><b>FieldName:</b></td>
    <td>field value</td>
  </tr>
  <tr>
    <td><b>Field2Name:</b></td>
    <td>field value</td>
  </tr>
  <tr>
    <td><b>Field3Name:</b></td>
    <td>field value</td>
  </tr>
</table>

在这两种情况下,您都可以看到我需要一个位于未命名元素中的值,其相邻的邻居是一个带有<b>FieldName:</b>正文的匹配标记。

我的问题是,我如何使用邻居标签来获取我需要的值?我可以用

来定位邻居
doc.xpath('//p/b[content(text(), "Referral Description:")]')

但我如何接受并说“给我你的邻居”?

2 个答案:

答案 0 :(得分:2)

我会使用Axis - following-sibling::执行以下操作:

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-html
<p><b>Referral Description:</b></p>
<p>
 This is the body of the referral's detailed description. 
 I want to get this text out of the document.
</p>
html

node = doc.xpath('//p[./b[contains(text(), "Referral Description:")]]/following-sibling::p')
puts node.text
# >> 
# >>  This is the body of the referral's detailed description. 
# >>  I want to get this text out of the document.

或者,使用外卡字符*

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-html
<p><b>Referral Description:</b></p>
<p>
 This is the body of the referral's detailed description. 
 I want to get this text out of the document.
</p>
html

["Referral Description:", "FieldName:", "Field1Name:"].map |header|
  doc.xpath("//*[./b[contains(text(), '#{header}')]]/following-sibling::*')
end
# >> 
# >>  ["This is the body of the referral's detailed description.\nI want to get this text out of the document.", "field value", "field value"]

对于HTML表格的第二部分:

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-html
<table>
  <tr>
    <td><b>FieldName:</b></td>
    <td>field value</td>
  </tr>
  <tr>
    <td><b>Field2Name:</b></td>
    <td>field value</td>
  </tr>
  <tr>
    <td><b>Field3Name:</b></td>
    <td>field value</td>
  </tr>
</table>
html

field_ary = %w(FieldName Field2Name Field3Name)
nodeset = field_ary.map{|n| doc.xpath("//td[./b[contains(.,'#{n}')]]/following-sibling::*")}
nodeset.map{|n| n.text }
# => ["field value", "field value", "field value"]

或(另一种方法)

nodeset = field_ary.map{|n| doc.xpath("//*[./b[contains(.,'#{n}')]]/following-sibling::*")}
nodeset.map{|n| n.text }
# => ["field value", "field value", "field value"]

答案 1 :(得分:1)

在css中,下一个相邻的兄弟选择器是+

doc.at('p:has(b[text()="Referral Description:"]) + p').text