我有这样的HTML代码:
<div id="printready">
<div class="box-single"></div>
<div class="marker"></div>
<h2>sometext</h2>
<div id="news-single-img"></div>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<span class="cl"></span>
... (remove everything since the last paragraph)
</div>
删除这些代码的最佳方法是.box-single
,.marker
,h2
,#news-single-img
,然后我想保留所有段落并删除其余的段落段落。
我尝试过Nokogiri,但没有找到一个好的解决方案。我使用的框架是Ruby on Rails!
答案 0 :(得分:4)
删除标签
doc.search('.box-single', '.marker', 'h2', '#news-single-img').remove
删除最后一个p
之后的节点while node = doc.at('p:last').next
node.remove
end
答案 1 :(得分:2)
你想做什么有些含糊不清,所以这是第一遍:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div id="printready">
<div class="box-single"></div>
<div class="marker"></div>
<h2>sometext</h2>
<div id="news-single-img"></div>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<span class="cl"></span>
... (remove everything since the last paragraph)
</div>
EOT
%w[.box-single .marker].each do |klass|
doc.search(klass).each do |tag|
tag['class'] = nil
end
end
doc.at('h2').remove
%w[#news-single-img].each do |tag_id|
doc.at(tag_id)['id'] = nil
end
loop do
next_tag = doc.at('span.cl').next_sibling
break unless next_tag
next_tag.remove
end
puts doc.to_html
跑步给我:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="printready">
<div class=""></div>
<div class=""></div>
<div id=""></div>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<span class="cl"></span>
</div></body></html>
如果您想完全删除class
和id
参数:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div id="printready">
<div class="box-single"></div>
<div class="marker"></div>
<h2>sometext</h2>
<div id="news-single-img"></div>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<span class="cl"></span>
... (remove everything since the last paragraph)
</div>
EOT
%w[.box-single .marker].each do |klass|
doc.search(klass).remove_attr('class')
end
doc.at('h2').remove
%w[#news-single-img].each do |tag_id|
doc.search(tag_id).remove_attr('id')
end
loop do
next_tag = doc.at('span.cl').next_sibling
break unless next_tag
next_tag.remove
end
puts doc.to_html
运行后参数消失了:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="printready">
<div></div>
<div></div>
<div></div>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<p>...</p>
<span class="cl"></span>
</div></body></html>
答案 2 :(得分:-2)
使用javascript你可以这样做:
<script type="text/javascript">
$(function () {
$("button").click(function () {
$(".box-single").remove();
});
});
</script>