我有html,就像这样:
<div id = "foo">
I want to parse this!
<ul class = "contact-data">
<li> Don't need</li
<li> heyyay </li>
</ul>
</div>
我的红宝石代码:
page = Nokogiri::HTML(open(url))
page.css('div.foo')
并page.css('div.foo').text
返回此div
中包含ul
标记文字的所有文字。
获取所需文字的最佳方式是什么?
答案 0 :(得分:3)
如果您想获得 CSS规则的第一个匹配项,请使用Nokogiri::XML::Node#at_css
require 'nokogiri'
@doc = Nokogiri::HTML.parse <<-HTML
<div id = "foo">
I want to parse this!
<ul class = "contact-data">
<li> Don\'t need</li
<li> heyyay </li>
</ul>
</div>
HTML
@doc.at_css("div#foo > text()").text.strip # => "I want to parse this!"
<强>更新强>
如果要获取 CSS规则的所有匹配项,请使用Nokogiri::XML::Node#css
require 'nokogiri'
@doc = Nokogiri::HTML.parse <<-HTML
<div id = "foo">
I want to parse this! - I
<ul class = "contact-data">
<li> not need</li
<li> heyyay </li>
</ul>
</div>
<div id = "foo">
I want to parse this! - II
<ul class = "contact-data">
<li> not need</li
<li> heyyay </li>
</ul>
</div>
HTML
@doc.css("div#foo > text()").each do |elm|
puts elm.text.strip
end
# >> I want to parse this! - I
# >> I want to parse this! - II