在HTML文档中给出标记,看起来像这样
<h3>test</h3>
<p>test</p>
<hr/>
<h3>test2</h3>
<p>test2</p>
<hr/>
我想制作这个
<div>
<h3>test</h3>
<p>test</p>
</div>
<div>
<h3>test2</h3>
<p>test2</p>
</div>
使用Nokogiri最优雅的方式是什么?
答案 0 :(得分:1)
修改:重新设计的答案要更清洁一点 Edit2 :小写重写缩短两行
require 'nokogiri'
doc = Nokogiri::HTML <<ENDHTML
<h3>test</h3>
<p>test</p>
<hr/>
<h3>test2</h3>
<p>test2</p>
<hr/>
ENDHTML
body = doc.at_css('body') # Created by parsing as HTML
kids = body.xpath('./*') # Every child of the body
body.inner_html = "" # Empty the body now that we have our nodes
div = (body << "<div>").first # Create our first container in the body
kids.each do |node| # For every child that was in the body...
if node.name=='hr'
div = (body << '<div>').first # Create a new container for stuff
else
div << node # Move this into the last container
end
end
div.remove unless div.child # Get rid of a trailing, empty div
puts body.inner_html
#=> <div>
#=> <h3>test</h3>
#=> <p>test</p>
#=> </div>
#=> <div>
#=> <h3>test2</h3>
#=> <p>test2</p>
#=> </div>
答案 1 :(得分:0)
这是一个使用Ruby 1.9.2的Enumerable#chunk
将孩子分成几个部分并且还练习Nokogiri的NodeSet
课程的答案:
require 'nokogiri'
doc = Nokogiri::HTML <<ENDHTML
<h3>test</h3>
<p>test</p>
<hr/>
<h3>test2</h3>
<p>test2</p>
<hr/>
ENDHTML
result = Nokogiri::XML::NodeSet.new( doc,
doc.xpath('//body/*').chunk do |n|
n.name=='hr'
end.reject do |matched,nodes|
matched
end.map do |matched,nodes|
doc.create_element('div').tap do |div|
div << Nokogiri::XML::NodeSet.new( doc, nodes )
end
end )
puts result
#=> <div>
#=> <h3>test</h3>
#=> <p>test</p>
#=> </div>
#=> <div>
#=> <h3>test2</h3>
#=> <p>test2</p>
#=> </div>
答案 2 :(得分:0)
我就是这样做的:
require 'nokogiri'
html = '
<h3>test</h3>
<p>test</p>
<hr/>
<h3>test2</h3>
<p>test2</p>
<hr/>
'
doc = Nokogiri::HTML(html)
doc2 = Nokogiri::HTML('<body />')
doc2_body = doc2.at('body')
doc.search('//h3 | //p').each_slice(2) do |ns|
nodeset = Nokogiri::XML::NodeSet.new(doc2, ns)
div = Nokogiri::XML::Node.new('div', doc2)
div.add_child(nodeset)
doc2_body.add_child(div)
end
puts doc2_body.inner_html
# >> <div>
# >> <h3>test</h3>
# >> <p>test</p>
# >> </div>
# >> <div>
# >> <h3>test2</h3>
# >> <p>test2</p>
# >> </div>