我试图让Nokogiri帮我在两个标签之间找到文字。 在这种情况下,我希望<之间的文本强> 和< ul> ,可能位于不同的父节点中。
HTML是动态的,可能会有很大差异。
以下是三种情况
案例1:预期的输出将是" 我不强"
<p>
I'm not interesting
</p>
<p>
<strong>I'm strong</strong>
<span>I'm not strong</span>
</p>
<ul>
<li> I'm a list item </li>
<li> Me too </li>
</ul>
案例2:预期输出为 nil 或空字符串
<p>
I'm not interesting
</p>
<div>
<strong>I'm strong</strong></br>
</div>
<ul>
<li> I'm a list item </li>
<li> Me too </li>
</ul>
案例3:预期输出我不强
<p>
I'm not interesting
</p>
<strong>I'm strong</strong>
<p>I'm not strong</strong>
<ul>
<li> I'm a list item </li>
<li> Me to </li>
</ul>
谢谢
答案 0 :(得分:1)
根据您提供的示例,这是 specs :
在 test.rb 文件中:
require 'nokogiri'
def get_text_of_a_node(doc, xpath)
doc.at_xpath(xpath).to_s
end
然后在 test_spec.rb 文件中:
require_relative '../test.rb'
require 'rspec'
describe "#get_text_of_a_node" do
let(:xpath) { ".//strong[text()=\"I'm strong\"]/following-sibling::span/text()" }
context "when <span> tag is present after <strong> with text" do
let(:xml) do
"<p>
I'm not interesting
</p>
<p>
<strong>I'm strong</strong>
<span>I'm not strong</span>
</p>
<ul>
<li> I'm a list item </li>
<li> Me too </li>
</ul>"
end
let(:doc) { Nokogiri::HTML::DocumentFragment.parse xml.strip }
it "returns text" do
expect(get_text_of_a_node(doc, xpath)).to eq("I'm not strong")
end
end
context "when <span> tag is absent after <strong>" do
let(:xml) do
"<p>
I'm not interesting
</p>
<div>
<strong>I'm strong</strong>
</br>undefined</div>undefined<ul>
<li> I'm a list item </li>
<li> Me too </li>undefined</ul>"
end
let(:doc) { Nokogiri::HTML::DocumentFragment.parse xml.strip }
it "returns empty string" do
expect(get_text_of_a_node(doc, xpath)).to be_empty
end
end
end
所有测试均通过:
[shreyas@arup_ruby (master)]$ rspec spec/test_spec.rb
..
Finished in 0.04067 seconds (files took 0.12591 seconds to load)
2 examples, 0 failures
[shreyas@arup_ruby (master)]$
答案 1 :(得分:0)
将Nokogiri xpath与选择器一起使用
//strong/following::*[not(self::ul|self::li)]
诀窍。要在标签之间获取文本,您可以使用
n.xpath("*//strong/following::*[not(self::ul|self::li)]").text
> "I'm not strong"