我有这个简单的HTML:
<div> Test <span> someting </span></div>
如何只检索div的innertext
?
使用text
检索div中的所有文本:
[1] pry(#<SandBox>)> first(:xpath, '//div').text
=> "Test someting"
在我的XPath查询中使用text()
会导致以下错误:
[2] pry(#<SandBox>)> first(:xpath, '//div/text()')
Capybara::Poltergeist::BrowserError: There was an error inside the PhantomJS portion of Poltergeist. This is probably a bug, so please report it.
TypeError: 'null' is not an object (evaluating 'window.getComputedStyle(element).display')
然而,使用与Nokogiri相同的XPath:
[3] pry(#<SandBox>)> Nokogiri::HTML(page.html).xpath('//div/text()').text
=> " Test "
有没有办法只使用水豚而不诉诸Nokogiri?
答案 0 :(得分:0)
你总是可以使用Nokogiri和open-uri。
require 'nokogiri'
require 'open-uri'
2.2.0 :021 > html = Nokogiri::HTML::DocumentFragment.parse('<div> Test <span> someting </span></div>').child
=> #<Nokogiri::XML::Element:0x44a7082 name="div" children=[#<Nokogiri::XML::Text:0x44a63ee " Test ">, #<Nokogiri::XML::Element:0x44a62e0 name="span" children=[#<Nokogiri::XML::Text:0x44a3f04 " someting ">]>]>
然后,您可以根据要抓取的内容对其执行操作。所以对于标签内的文字:
2.2.0 :072 > html.children.first
=> #<Nokogiri::XML::Text:0x45ea37c " Test ">
2.2.0 :073 > html.children.first.text
=> " Test "
或
2.2.0 :215 > html.children.first.content
=> " Test "
祝你好运!