我的文档有以下两种格式:
<p><b>Referral Description:</b></p>
<p>
This is the body of the referral's detailed description.
I want to get this text out of the document.
</p>
和
<table>
<tr>
<td><b>FieldName:</b></td>
<td>field value</td>
</tr>
<tr>
<td><b>Field2Name:</b></td>
<td>field value</td>
</tr>
<tr>
<td><b>Field3Name:</b></td>
<td>field value</td>
</tr>
</table>
在这两种情况下,您都可以看到我需要一个位于未命名元素中的值,其相邻的邻居是一个带有<b>FieldName:</b>
正文的匹配标记。
我的问题是,我如何使用邻居标签来获取我需要的值?我可以用
来定位邻居doc.xpath('//p/b[content(text(), "Referral Description:")]')
但我如何接受并说“给我你的邻居”?
答案 0 :(得分:2)
我会使用Axis - following-sibling::
执行以下操作:
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-html
<p><b>Referral Description:</b></p>
<p>
This is the body of the referral's detailed description.
I want to get this text out of the document.
</p>
html
node = doc.xpath('//p[./b[contains(text(), "Referral Description:")]]/following-sibling::p')
puts node.text
# >>
# >> This is the body of the referral's detailed description.
# >> I want to get this text out of the document.
或者,使用外卡字符*
:
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-html
<p><b>Referral Description:</b></p>
<p>
This is the body of the referral's detailed description.
I want to get this text out of the document.
</p>
html
["Referral Description:", "FieldName:", "Field1Name:"].map |header|
doc.xpath("//*[./b[contains(text(), '#{header}')]]/following-sibling::*')
end
# >>
# >> ["This is the body of the referral's detailed description.\nI want to get this text out of the document.", "field value", "field value"]
对于HTML表格的第二部分:
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-html
<table>
<tr>
<td><b>FieldName:</b></td>
<td>field value</td>
</tr>
<tr>
<td><b>Field2Name:</b></td>
<td>field value</td>
</tr>
<tr>
<td><b>Field3Name:</b></td>
<td>field value</td>
</tr>
</table>
html
field_ary = %w(FieldName Field2Name Field3Name)
nodeset = field_ary.map{|n| doc.xpath("//td[./b[contains(.,'#{n}')]]/following-sibling::*")}
nodeset.map{|n| n.text }
# => ["field value", "field value", "field value"]
或(另一种方法)
nodeset = field_ary.map{|n| doc.xpath("//*[./b[contains(.,'#{n}')]]/following-sibling::*")}
nodeset.map{|n| n.text }
# => ["field value", "field value", "field value"]
答案 1 :(得分:1)
在css中,下一个相邻的兄弟选择器是+
:
doc.at('p:has(b[text()="Referral Description:"]) + p').text