Nokogiri:获取不在<a> tag

时间:2018-01-28 09:28:48

标签: nokogiri

Take a look at this example:

<li><a href="http://website.com/">This is a website</a>, it belongs to John Sulliva</li>

I can get the content of the <li> tag by using:

nodeset = doc.css('li')

I also can get the text inside the <a> tag by using:

nodeset.each do |element|

  ahref = element.css('a') // <-- <a href="http://website.com/">This is a website</a>
  name = ahref.text.strip // <--This is a website
end

But how do I get the rest of the text within the <li> tag but without the text from the <a> tag?

From this example, I like to get

", it belongs to John Sullivan"

How can I do this?

2 个答案:

答案 0 :(得分:1)

使用XPath和text()节点测试很简单。如果您已将li提取到nodeset,则可以使用以下内容获取文字:

nodeset.xpath('./text()')

或者您可以直接从整个文档中获取它:

doc.xpath('//li/text()')

这使用text()节点测试作为te XPath表达式的一部分,而不是text Ruby方法。它提取li节点 direct 后代的任何文本节点,因此不包含a元素的内容。

答案 1 :(得分:0)

我找到了一种便宜的方法来获取其余的文字:

  ahref = element.css('a')

  name = ahref.text.strip

  suppl =  element.text.strip.gsub(name, '')