如何使用nokogiri在不同父母的两个元素之间获取文本

时间:2015-03-10 10:59:50

标签: ruby nokogiri

我试图让Nokogiri帮我在两个标签之间找到文字。 在这种情况下,我希望<之间的文本强> < ul> ,可能位于不同的父节点中。

HTML是动态的,可能会有很大差异。

以下是三种情况

案例1:预期的输出将是" 我不强"

<p>
  I'm not interesting
</p>
<p>
  <strong>I'm strong</strong>
  <span>I'm not strong</span>
</p>
<ul>
  <li> I'm a list item </li>
  <li> Me too </li>
</ul>

案例2:预期输出为 nil 空字符串

<p>
  I'm not interesting
</p>
<div>
  <strong>I'm strong</strong></br>
</div>
<ul>
  <li> I'm a list item </li>
  <li> Me too </li>
</ul>

案例3:预期输出我不强

<p>
  I'm not interesting
</p>
<strong>I'm strong</strong>
<p>I'm not strong</strong>
<ul>
  <li> I'm a list item </li>
  <li> Me to </li>
</ul>

谢谢

2 个答案:

答案 0 :(得分:1)

根据您提供的示例,这是 specs

test.rb 文件中:

require 'nokogiri'

def get_text_of_a_node(doc, xpath)
  doc.at_xpath(xpath).to_s
end

然后在 test_spec.rb 文件中:

require_relative '../test.rb'
require 'rspec'

describe "#get_text_of_a_node" do
  let(:xpath) { ".//strong[text()=\"I'm strong\"]/following-sibling::span/text()" }

  context "when <span> tag is present after <strong> with text" do
    let(:xml) do
        "<p>
          I'm not interesting
        </p>
        <p>
            <strong>I'm strong</strong>
            <span>I'm not strong</span>
        </p>
        <ul>
            <li> I'm a list item </li>
            <li> Me too </li>
        </ul>"
    end
    let(:doc) { Nokogiri::HTML::DocumentFragment.parse xml.strip }

    it "returns text" do
      expect(get_text_of_a_node(doc, xpath)).to eq("I'm not strong")
    end
  end 

  context "when <span> tag is absent after <strong>" do
    let(:xml) do
        "<p>
          I'm not interesting
        </p>
        <div>
            <strong>I'm strong</strong>
        </br>undefined</div>undefined<ul>
        <li> I'm a list item </li>
        <li> Me too </li>undefined</ul>"
    end
    let(:doc) { Nokogiri::HTML::DocumentFragment.parse xml.strip }

    it "returns empty string" do
      expect(get_text_of_a_node(doc, xpath)).to be_empty
    end
  end
end

所有测试均通过:

[shreyas@arup_ruby (master)]$ rspec spec/test_spec.rb
..

Finished in 0.04067 seconds (files took 0.12591 seconds to load)
2 examples, 0 failures

[shreyas@arup_ruby (master)]$

答案 1 :(得分:0)

将Nokogiri xpath与选择器一起使用

//strong/following::*[not(self::ul|self::li)]

诀窍。要在标签之间获取文本,您可以使用

n.xpath("*//strong/following::*[not(self::ul|self::li)]").text
> "I'm not strong"