我需要摆脱<!-- custom ads -->
和<!-- /custom ads -->
之间的部分
在此代码段中。
<!-- custom ads -->
<div style="float:left">
<!-- custom_Forum_Postbit_336x280 -->
<div id='div-gpt-ad-1526374586789-2' style='width:336px; height:280px;'>
<script type='text/javascript'>
googletag.display('div-gpt-ad-1526374586789-2');
</script>
</div>
</div>
<div style="float:left; padding-left:20px">
<!-- custom_Forum_Postbit_336x280_r -->
<div id='div-gpt-ad-1526374586789-3' style='width:336px; height:280px;'>
<script type='text/javascript'>
googletag.display('div-gpt-ad-1526374586789-3');
</script>
</div>
</div>
<div class="clear"></div>
<br>
<!-- /custom ads -->
<!-- google_ad_section_start -->Some Text,<br>
Some More Text...<br>
<!-- google_ad_section_end -->
我已经可以使用此xPath //comment()[contains(., 'custom')]
找到两条评论,但现在我仍然坚持如何删除所有内容,这些内容位于这些“标记”之间。
foreach (var comment in htmlDoc.DocumentNode.SelectNodes("//comment()[contains(., 'custom')]"))
{
MessageBox.Show(comment.OuterHtml);
}
有什么建议吗?
答案 0 :(得分:3)
//find all comment nodes that contain "custom ads"
var nodes = doc.DocumentNode
.Descendants()
.OfType<HtmlCommentNode>()
.Where(c => c.Comment.Contains("custom ads"))
.ToList();
//create a sequence of pairs of nodes
var nodePairs = nodes
.Select((node, index) => new {node, index})
.GroupBy(x => x.index / 2)
.Select(g => g.ToArray())
.Select(a => new { startComment = a[0].node, endComment = a[1].node});
foreach (var pair in nodePairs)
{
var startNode = pair.startComment;
var endNode = pair.endComment;
//check they share the same parent or the wheels will fall off
if(startNode.ParentNode != endNode.ParentNode) throw new Exception();
//iterate all nodes inbetween
var currentNode = startNode.NextSibling;
while(currentNode != endNode)
{
//currentNode won't have siblings when we trim it from the doc
//so grab the nextSibling while it's still attached
var n = currentNode.NextSibling;
//and cut out currentNode
currentNode.Remove();
currentNode = n;
}
}