如何使用水豚和恶作剧检索innertext?

时间:2014-03-26 12:57:06

标签: ruby xpath capybara nokogiri poltergeist

我有这个简单的HTML:

<div> Test <span> someting </span></div>

如何只检索div的innertext

使用text检索div中的所有文本:

[1] pry(#<SandBox>)> first(:xpath, '//div').text
=> "Test someting"

在我的XPath查询中使用text()会导致以下错误:

[2] pry(#<SandBox>)> first(:xpath, '//div/text()')
Capybara::Poltergeist::BrowserError: There was an error inside the PhantomJS portion of Poltergeist. This is probably a bug, so please report it. 
TypeError: 'null' is not an object (evaluating 'window.getComputedStyle(element).display')

然而,使用与Nokogiri相同的XPath:

[3] pry(#<SandBox>)> Nokogiri::HTML(page.html).xpath('//div/text()').text
=> " Test "

有没有办法只使用水豚而不诉诸Nokogiri?

1 个答案:

答案 0 :(得分:0)

你总是可以使用Nokogiri和open-uri。

require 'nokogiri'
require 'open-uri'

2.2.0 :021 > html = Nokogiri::HTML::DocumentFragment.parse('<div> Test <span> someting     </span></div>').child

 => #<Nokogiri::XML::Element:0x44a7082 name="div" children=[#<Nokogiri::XML::Text:0x44a63ee " Test ">, #<Nokogiri::XML::Element:0x44a62e0 name="span" children=[#<Nokogiri::XML::Text:0x44a3f04 " someting ">]>]> 

然后,您可以根据要抓取的内容对其执行操作。所以对于标签内的文字:

2.2.0 :072 > html.children.first

 => #<Nokogiri::XML::Text:0x45ea37c " Test "> 

2.2.0 :073 > html.children.first.text

=> " Test " 

2.2.0 :215 > html.children.first.content

 => " Test "
祝你好运!