Take a look at this example:
<li><a href="http://website.com/">This is a website</a>, it belongs to John Sulliva</li>
I can get the content of the <li>
tag by using:
nodeset = doc.css('li')
I also can get the text inside the <a>
tag by using:
nodeset.each do |element|
ahref = element.css('a') // <-- <a href="http://website.com/">This is a website</a>
name = ahref.text.strip // <--This is a website
end
But how do I get the rest of the text within the <li>
tag but without the text from the <a>
tag?
From this example, I like to get
", it belongs to John Sullivan"
How can I do this?
答案 0 :(得分:1)
使用XPath和text()
节点测试很简单。如果您已将li
提取到nodeset
,则可以使用以下内容获取文字:
nodeset.xpath('./text()')
或者您可以直接从整个文档中获取它:
doc.xpath('//li/text()')
这使用text()
节点测试作为te XPath表达式的一部分,而不是text
Ruby方法。它提取li
节点 direct 后代的任何文本节点,因此不包含a
元素的内容。
答案 1 :(得分:0)
我找到了一种便宜的方法来获取其余的文字:
ahref = element.css('a')
name = ahref.text.strip
suppl = element.text.strip.gsub(name, '')