HTMLAgilityPack和编组点击信号

时间:2013-02-23 19:07:04

标签: c# linq parsing html-agility-pack

我正在尝试抓取一个HTML文档。

    HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
    htmlDoc.OptionFixNestedTags=true;

    htmlDoc.Load(@"C:\..\page.html");

    var hptexts = htmlDoc.DocumentNode.Descendants("a");

    foreach (var hptext in hptexts)
    {
        string desc = hptext.Attributes["href"].Value;
        Console.WriteLine(desc);
    }

这是我使用的代码,打印页面的5-6个第一个链接,然后显示“Marshal clicked signal”,我在这里缺少什么?

编辑02/27/13 这是使抓取停止的HTML代码

</a> </li> </ul> <span class="login-area"> <div id="p_p_id_bonLoginPortlet_WAR_bonportlet_" class="portlet-boundary portlet-boundary_bonLoginPortlet_WAR_bonportlet_  portlet-static portlet-static-end  " > <a id="p_bonLoginPortlet_WAR_bonportlet"></a> <div class="portlet-borderless-container" style=""> <div class="portlet-body"> <!--p:ajaxStatus onstart="statusDialog.show();" onsuccess="statusDialog.hide();"/--><div id="A4925:j_idt5" style="display:none" title="text"><img id="A4925:j_idt6" src="/bon-portlet/images/ajaxloadingbar.gif" alt="" /></div><script type="text/javascript">/*<![CDATA[*/$(function(){statusDialog=new PrimeFaces.widget.Dialog("A4925:j_idt5",{autoOpen:false,minHeight:0,draggable:false,zIndex:9999,resizable:false,appendToBody:true,closable:false})});/*]]>*/</script><div id="A4925:j_idt7" style="display:none" title="text"> <form id="A4925:frmSendPassword" name="A4925:frmSendPassword" method="post" action="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_auth=giN14rWX&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=1&amp;p_p_mode=view&amp;p_p_state=normal" enctype="application/x-www-form-urlencoded"> <input type="hidden" name="A4925:frmSendPassword" value="A4925:frmSendPassword" /> <input type="hidden" name="javax.faces.encodedURL" value="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=2&amp;p_p_mode=view&amp;p_p_state=normal" /> <span id="A4925:myPanel2"> <p>text</p><table class="forgot"> <tbody> <tr> <td> <div></td> </tr> <tr> <td><div id="A4925:j_idt14" class="ui-messages ui-widget"></div></td> </tr> <tr> <td></div> <div> </div></td> </tr> <tr> <td> <div><label for="A4925:txtEmail2" class=""> Email</label><input id="A4925:txtEmail2" type="text" name="A4925:txtEmail2" maxlength="40" size="40" /> </div> <div><div id="A4925:msgFortxtEmail2"></div> </div></td> </tr> </tbody> </table> <div class="sendButton"><button id="A4925:j_idt24" name="A4925:j_idt24" onclick="PrimeFaces.ab({formId:'A4925:frmSendPassword',source:'A4925:j_idt24',process:'A4925:myPanel2',update:'A4925:myPanel2'});return false;" type="submit">text</button><script type="text/javascript">/*<![CDATA[*/widget_A4925_j_idt24=new PrimeFaces.widget.CommandButton("A4925:j_idt24",{});/*]]>*/</script> </div></span><input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="H4sIAAAAAAAAAFvzloG1oLiIQTArsSxRr7QkM0fPI7E4wzexgJX91sHDYgkXmRmY3Bi4cvITU9wSk0vyizwZOEsyilKLM/JzUioK7B0YQICnnANICgAxSwkDv6OJpZGpVUlFiWtuYmaOUWkRg3C0D9iCnMS8dD3/pKzU5BLrCeci5gsUa+YwMTBUFIB0FgBBCQMrWHdpIUMdAzNQlAnOYgXJw3nMdUUMuiAzK/TSEpNTi/WS83ML8vNS80r0Qj3DMlPLg/LzS1QCivILUotKKr1TK4sZoEAIaF8RAx/CPa55pbnIkkBHsOUkFpd4psADBqzOM68kNT21SOjRgiXfG9stmBgYPRlYyxJzSlOB5gkg1PmV5ialFrWtmSrLPeVBN8x7XCDP8cKDJiCxuBjuGajX+VADDlW6AgDM3QSPqwEAAA==" autocomplete="off" /> </form></div><script type="text/javascript">/*<![CDATA[*/$(function(){loginDialog=new PrimeFaces.widget.Dialog("A4925:j_idt7",{autoOpen:false,minHeight:0,dialogClass:"modal",modal:true,zIndex:9900,resizable:false,appendToBody:true})});/*]]>*/</script> <script></script> <script>/*<![CDATA[*/PlayerClub.userName="";PlayerClub.token="";/*]]>*/</script> <form id="A4925:frmBonLogin" name="A4925:frmBonLogin" method="post" action="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_auth=giN14rWX&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=1&amp;p_p_mode=view&amp;p_p_state=normal" class="login" enctype="application/x-www-form-urlencoded"> <input type="hidden" name="A4925:frmBonLogin" value="A4925:frmBonLogin" /> <input type="hidden" name="javax.faces.encodedURL" value="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=2&amp;p_p_mode=view&amp;p_p_state=normal" /> <script type="text/javascript">/*<![CDATA[*/refreshLogin=function(){PrimeFaces.ab({formId:"A4925:frmBonLogin",source:"A4925:j_idt27",process:"@all",update:"A4925:pnlLogin",params:arguments[0]})};/*]]>*/</script><span id="A4925:pageMessages"></span><script type="text/javascript">/*<![CDATA[*/$(function(){widget_A4925_pageMessages=new PrimeFaces.widget.Growl("A4925:pageMessages",[])});/*]]>*/</script><span id="A4925:pnlNotLoggedIn"><table> <tbody> <tr> <td><input id="A4925:txtEmail" type="text" name="A4925:txtEmail" /></td> <td><input id="A4925:txtPass" type="password" name="A4925:txtPass" /></td> <td><button id="A4925:j_idt29" name="A4925:j_idt29" class="loginBtn" onclick="PrimeFaces.ab({formId:'A4925:frmBonLogin',source:'A4925:j_idt29',process:'A4925:pnlNotLoggedIn',update:'A4925:pnlLogin A4925:pnlNotLoggedIn A4925:pageMessages',oncomplete:function(xhr, status, args){if(args.redirect != null) {                     PlayerClub.Events.notify({name : 'loggedIn'});                    }              ;}});return false;" type="submit">text</button><script type="text/javascript">/*<![CDATA[*/widget_A4925_j_idt29=new PrimeFaces.widget.CommandButton("A4925:j_idt29",{});/*]]>*/</script></td> <td> <a class="forgotpwd" onclick="loginDialog.show()">text</a></td> <td>Β Β </td> </tr> </tbody> </table> </span><span id="A4925:pnlLogin"></span><input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="H4sIAAAAAAAAAFvzloG1oLiIQTArsSxRr7QkM0fPI7E4wzexgJX91sHDYgkXmRmY3Bi4cvITU9wSk0vyizwZOEsyilKLM/JzUioK7B0YQICnnANICgAxSwkDv6OJpZGpVUlFiWtuYmaOUWkRg3C0D9iCnMS8dD3/pKzU5BLrCeci5gsUa+YwMTBUFIB0FgBBCQMrWHdpIUMdAzNQlAnOYgXJw3nMdUUMuiAzK/TSEpNTi/WS83ML8vNS80r0Qj3DMlPLg/LzS1QCivILUotKKr1TK4sZoEAIaF8RAx/CPa55pbnIkkBHsOUkFpd4psADBqzOM68kNT21SOjRgiXfG9stmBgYPRlYyxJzSlOB5gkg1PmV5ialFrWtmSrLPeVBN8x7XCDP8cKDJiCxuBjuGajX+VADDlW6AgDM3QSPqwEAAA==" autocomplete="off" /> </form> <script>/*<![CDATA[*/jQuery(document).ready(function(){if(false){refreshLogin()}});var myLoginListener=new PlayerClub.EventListener("loginListener");myLoginListener.onEvent=function(a){if(a.name=="loggedIn"){}else{if(a.name=="loggedOut"){refreshLogin()}}};PlayerClub.Events.add(myLoginListener);/*]]>*/</script> </div> </div> </div> </span> </div> </div> </div> <div id="wrapper-header"> <header id="banner" role="banner"> <hgroup id="heading"> <h1 class="company-title"> <a class="logo" href="http://www.website.com/el/web/guest/corporate" title="text website.com"></a> <div id="header_right" style="float:right;"> <div id="BannerWebContent"> <div id="p_p_id_56_INSTANCE_w9W4_" class="portlet-boundary portlet-boundary_56_  portlet-static portlet-static-end portlet-journal-content " > <a id="p_56_INSTANCE_w9W4"></a> <div class="portlet-borderless-container" style=""> <div class="portlet-body"> <div class="journal-content-article" id="article_10132_10157_1794409_12.1"> <p style="text-align: right;"> <a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itCvQgnTqD9sNyqKaBsNZOM6&amp;index=1" style="line-height: 1.4;" target="_blank"><img alt="text" src="/image/image_gallery?uuid=666cec6b-27b5-483a-9737-f760ddbca822&amp;groupId=10157&amp;t=1361443806379" style="width: 120px; height: 60px;" /></a><a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itD0C6mk7FhFtyzGJ7lxSkFz" target="_blank"><img alt="text" src="/image/image_gallery?uuid=44e791df-7011-406e-ab58-04caa1d3b904&amp;groupId=10157&amp;t=1361373259040" style="width: 120px; height: 60px;" /></a><a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itD0C6mk7FhFtyzGJ7lxSkFz&amp;index=1" target="_blank"><img alt="text" src="/image/image_gallery?uuid=edf1a342-48e3-4785-b845-4e3adae50c36&amp;groupId=10157&amp;t=1361282247908" style="width: 75px; height: 60px;" /></a><a href="http://www.youtube.com/&amp;list=PLMiMYkQZ3itBR11d2SBq52tBgIScoPIAd&amp;index=1" target="_blank"><img alt="text" src="/image/image_gallery?uuid=cd7445f3-3c3e-40aa-82f8-8101dea42892&amp;groupId=10157&amp;t=1361282247908" style="width: 60px; height: 60px;" /></a><a href="http://www.youtube.com" target="_blank"><img alt="" src="/image/image_gallery?uuid=35289fbe-9586-4fcf-841f-b7422ace31df&amp;groupId=10157&amp;t=1359117469851" style="width: 60px; height: 60px;" /></a><a href="http://www.facebook.com" target="_blank"><img alt="" src="/image/image_gallery?uuid=b8457859-354f-4a70-b03b-895e0d836f8c&amp;groupId=10157&amp;t=1359117469851" style="width: 60px; height: 60px;" /></a><a href="link" target=""><img alt="RSS" src="/image/image_gallery?uuid=56cc6300-0322-4845-bf88-0f6540b815ef&amp;groupId=10157&amp;t=1361282247893" style="width: 60px; height: 60px;" /></a><a href="link"><img alt="text" src="/image/image_gallery?uuid=33b4f346-1490-48d0-86d9-719bd7b059fb&amp;groupId=10157&amp;t=1361459169242" style="width: 60px; height: 60px;" /></a></p> </div> </div> </div> </div> </div> </div> </h1> <h2 class="community-title"> <a href="http://www.website.com/web/guest/test-draw-results?p_auth=giN14rWX&amp;p_p_auth=0xCQE4av&amp;p_p_id=49&amp;p_p_lifecycle=1&amp;p_p_state=normal&amp;p_p_mode=view&amp;p_p_col_count=1&amp;_49_struts_action=%2fmy_places%2fview&amp;_49_groupId=10157&amp;_49_privateLayout=false" title="text website.com"> <span>website.com</span> </a> </h2> <h3 class="page-title"> <span>text‚ &amp; text</span> </h3> </hgroup> <div id="navigation"> <div class="navigation-main-content"> <ul> <li class=""> <a href="link"  class="home"> <div><span>+</span></div> </a> </li> <li class="menu-seperator"><div></div></li> <li class="selected "> <a href="link" > text</a></a> </li> </ul> <span class="login-area"> <div id="p_p_id_bonLoginPortlet_WAR_bonportlet_" class="portlet-boundary portlet-boundary_bonLoginPortlet_WAR_bonportlet_  portlet-static portlet-static-end  " > <a id="p_bonLoginPortlet_WAR_bonportlet"></a> <div class="portlet-borderless-container" style=""> <div class="portlet-body"> <!--p:ajaxStatus onstart="statusDialog.show();" onsuccess="statusDialog.hide();"/--><div id="A4925:j_idt5" style="display:none" title="text"><img id="A4925:j_idt6" src="/bon-portlet/images/ajaxloadingbar.gif" alt="" /></div><script type="text/javascript">/*<![CDATA[*/$(function(){statusDialog=new PrimeFaces.widget.Dialog("A4925:j_idt5",{autoOpen:false,minHeight:0,draggable:false,zIndex:9999,resizable:false,appendToBody:true,closable:false})});/*]]>*/</script><div id="A4925:j_idt7" style="display:none" title="text"> <form id="A4925:frmSendPassword" name="A4925:frmSendPassword" method="post" action="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_auth=giN14rWX&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=1&amp;p_p_mode=view&amp;p_p_state=normal" enctype="application/x-www-form-urlencoded"> <input type="hidden" name="A4925:frmSendPassword" value="A4925:frmSendPassword" /> <input type="hidden" name="javax.faces.encodedURL" value="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=2&amp;p_p_mode=view&amp;p_p_state=normal" /> <span id="A4925:myPanel2"> <p>text</p><table class="forgot"> <tbody> <tr> <td> <div></td> </tr> <tr> <td><div id="A4925:j_idt14" class="ui-messages ui-widget"></div></td> </tr> <tr> <td></div> <div> </div></td> </tr> <tr> <td> <div><label for="A4925:txtEmail2" class=""> Email</label><input id="A4925:txtEmail2" type="text" name="A4925:txtEmail2" maxlength="40" size="40" /> </div> <div><div id="A4925:msgFortxtEmail2"></div> </div></td> </tr> </tbody> </table> <div class="sendButton"><button id="A4925:j_idt24" name="A4925:j_idt24" onclick="PrimeFaces.ab({formId:'A4925:frmSendPassword',source:'A4925:j_idt24',process:'A4925:myPanel2',update:'A4925:myPanel2'});return false;" type="submit">text</button><script type="text/javascript">/*<![CDATA[*/widget_A4925_j_idt24=new PrimeFaces.widget.CommandButton("A4925:j_idt24",{});/*]]>*/</script> </div></span><input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="H4sIAAAAAAAAAFvzloG1oLiIQTArsSxRr7QkM0fPI7E4wzexgJX91sHDYgkXmRmY3Bi4cvITU9wSk0vyizwZOEsyilKLM/JzUioK7B0YQICnnANICgAxSwkDv6OJpZGpVUlFiWtuYmaOUWkRg3C0D9iCnMS8dD3/pKzU5BLrCeci5gsUa+YwMTBUFIB0FgBBCQMrWHdpIUMdAzNQlAnOYgXJw3nMdUUMuiAzK/TSEpNTi/WS83ML8vNS80r0Qj3DMlPLg/LzS1QCivILUotKKr1TK4sZoEAIaF8RAx/CPa55pbnIkkBHsOUkFpd4psADBqzOM68kNT21SOjRgiXfG9stmBgYPRlYyxJzSlOB5gkg1PmV5ialFrWtmSrLPeVBN8x7XCDP8cKDJiCxuBjuGajX+VADDlW6AgDM3QSPqwEAAA==" autocomplete="off" /> </form></div><script type="text/javascript">/*<![CDATA[*/$(function(){loginDialog=new PrimeFaces.widget.Dialog("A4925:j_idt7",{autoOpen:false,minHeight:0,dialogClass:"modal",modal:true,zIndex:9900,resizable:false,appendToBody:true})});/*]]>*/</script> <script></script> <script>/*<![CDATA[*/PlayerClub.userName="";PlayerClub.token="";/*]]>*/</script> <form id="A4925:frmBonLogin" name="A4925:frmBonLogin" method="post" action="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_auth=giN14rWX&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=1&amp;p_p_mode=view&amp;p_p_state=normal" class="login" enctype="application/x-www-form-urlencoded"> <input type="hidden" name="A4925:frmBonLogin" value="A4925:frmBonLogin" /> <input type="hidden" name="javax.faces.encodedURL" value="http://www.website.com/web/guest/test-draw-results?_bonLoginPortlet_WAR_bonportlet__facesViewId=%2fxhtml%2flogin.xhtml&amp;p_p_col_id=&amp;p_p_id=bonLoginPortlet_WAR_bonportlet&amp;p_p_lifecycle=2&amp;p_p_mode=view&amp;p_p_state=normal" /> <script type="text/javascript">/*<![CDATA[*/refreshLogin=function(){PrimeFaces.ab({formId:"A4925:frmBonLogin",source:"A4925:j_idt27",process:"@all",update:"A4925:pnlLogin",params:arguments[0]})};/*]]>*/</script><span id="A4925:pageMessages"></span><script type="text/javascript">/*<![CDATA[*/$(function(){widget_A4925_pageMessages=new PrimeFaces.widget.Growl("A4925:pageMessages",[])});/*]]>*/</script><span id="A4925:pnlNotLoggedIn"><table> <tbody> <tr> <td><input id="A4925:txtEmail" type="text" name="A4925:txtEmail" /></td> <td><input id="A4925:txtPass" type="password" name="A4925:txtPass" /></td> <td><button id="A4925:j_idt29" name="A4925:j_idt29" class="loginBtn" onclick="PrimeFaces.ab({formId:'A4925:frmBonLogin',source:'A4925:j_idt29',process:'A4925:pnlNotLoggedIn',update:'A4925:pnlLogin A4925:pnlNotLoggedIn A4925:pageMessages',oncomplete:function(xhr, status, args){if(args.redirect != null) {                     PlayerClub.Events.notify({name : 'loggedIn'});                    }              ;}});return false;" type="submit">text</button><script type="text/javascript">/*<![CDATA[*/widget_A4925_j_idt29=new PrimeFaces.widget.CommandButton("A4925:j_idt29",{});/*]]>*/</script></td> <td> <a class="forgotpwd" onclick="loginDialog.show()">text</a></td> <td>Β Β </td> </tr> </tbody> </table> </span><span id="A4925:pnlLogin"></span><input type="hidden" name="javax.faces.ViewState" id="javax.faces.ViewState" value="H4sIAAAAAAAAAFvzloG1oLiIQTArsSxRr7QkM0fPI7E4wzexgJX91sHDYgkXmRmY3Bi4cvITU9wSk0vyizwZOEsyilKLM/JzUioK7B0YQICnnANICgAxSwkDv6OJpZGpVUlFiWtuYmaOUWkRg3C0D9iCnMS8dD3/pKzU5BLrCeci5gsUa+YwMTBUFIB0FgBBCQMrWHdpIUMdAzNQlAnOYgXJw3nMdUUMuiAzK/TSEpNTi/WS83ML8vNS80r0Qj3DMlPLg/LzS1QCivILUotKKr1TK4sZoEAIaF8RAx/CPa55pbnIkkBHsOUkFpd4psADBqzOM68kNT21SOjRgiXfG9stmBgYPRlYyxJzSlOB5gkg1PmV5ialFrWtmSrLPeVBN8x7XCDP8cKDJiCxuBjuGajX+VADDlW6AgDM3QSPqwEAAA==" autocomplete="off" /> </form> <script>/*<![CDATA[*/jQuery(document).ready(function(){if(false){refreshLogin()}});var myLoginListener=new PlayerClub.EventListener("loginListener");myLoginListener.onEvent=function(a){if(a.name=="loggedIn"){}else{if(a.name=="loggedOut"){refreshLogin()}}};PlayerClub.Events.add(myLoginListener);/*]]>*/</script> </div> </div> </div> </span> </div> </div> </div> <div id="wrapper-header"> <header id="banner" role="banner"> <hgroup id="heading"> <h1 class="company-title"> <a class="logo" href="http://www.website.com/el/web/guest/corporate" title="text website.com"></a> <div id="header_right" style="float:right;"> <div id="BannerWebContent"> <div id="p_p_id_56_INSTANCE_w9W4_" class="portlet-boundary portlet-boundary_56_  portlet-static portlet-static-end portlet-journal-content " > <a id="p_56_INSTANCE_w9W4"></a> <div class="portlet-borderless-container" style=""> <div class="portlet-body"> <div class="journal-content-article" id="article_10132_10157_1794409_12.1"> <p style="text-align: right;"> <a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itCvQgnTqD9sNyqKaBsNZOM6&amp;index=1" style="line-height: 1.4;" target="_blank"><img alt="text" src="/image/image_gallery?uuid=666cec6b-27b5-483a-9737-f760ddbca822&amp;groupId=10157&amp;t=1361443806379" style="width: 120px; height: 60px;" /></a><a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itD0C6mk7FhFtyzGJ7lxSkFz" target="_blank"><img alt="text" src="/image/image_gallery?uuid=44e791df-7011-406e-ab58-04caa1d3b904&amp;groupId=10157&amp;t=1361373259040" style="width: 120px; height: 60px;" /></a><a href="http://www.youtube.com&amp;list=PLMiMYkQZ3itD0C6mk7FhFtyzGJ7lxSkFz&amp;index=1" target="_blank"><img alt="text" src="/image/image_gallery?uuid=edf1a342-48e3-4785-b845-4e3adae50c36&amp;groupId=10157&amp;t=1361282247908" style="width: 75px; height: 60px;" /></a><a href="http://www.youtube.com/&amp;list=PLMiMYkQZ3itBR11d2SBq52tBgIScoPIAd&amp;index=1" target="_blank"><img alt="text" src="/image/image_gallery?uuid=cd7445f3-3c3e-40aa-82f8-8101dea42892&amp;groupId=10157&amp;t=1361282247908" style="width: 60px; height: 60px;" /></a><a href="http://www.youtube.com" target="_blank"><img alt="" src="/image/image_gallery?uuid=35289fbe-9586-4fcf-841f-b7422ace31df&amp;groupId=10157&amp;t=1359117469851" style="width: 60px; height: 60px;" /></a><a href="http://www.facebook.com" target="_blank"><img alt="" src="/image/image_gallery?uuid=b8457859-354f-4a70-b03b-895e0d836f8c&amp;groupId=10157&amp;t=1359117469851" style="width: 60px; height: 60px;" /></a><a href="link" target=""><img alt="RSS" src="/image/image_gallery?uuid=56cc6300-0322-4845-bf88-0f6540b815ef&amp;groupId=10157&amp;t=1361282247893" style="width: 60px; height: 60px;" /></a><a href="link"><img alt="text" src="/image/image_gallery?uuid=33b4f346-1490-48d0-86d9-719bd7b059fb&amp;groupId=10157&amp;t=1361459169242" style="width: 60px; height: 60px;" /></a></p> </div> </div> </div> </div> </div> </div> </h1> <h2 class="community-title"> <a href="http://www.website.com/web/guest/test-draw-results?p_auth=giN14rWX&amp;p_p_auth=0xCQE4av&amp;p_p_id=49&amp;p_p_lifecycle=1&amp;p_p_state=normal&amp;p_p_mode=view&amp;p_p_col_count=1&amp;_49_struts_action=%2fmy_places%2fview&amp;_49_groupId=10157&amp;_49_privateLayout=false" title="text website.com"> <span>website.com</span> </a> </h2> <h3 class="page-title"> <span>text‚ &amp; text</span> </h3> </hgroup> <div id="navigation"> <div class="navigation-main-content"> <ul> <li class=""> <a href="link"  class="home"> <div><span>+</span></div> </a> </li> <li class="menu-seperator"><div></div></li> <li class="selected "> <a href="link" > text</a>

在此之前,它会正确打印所有内容。

2 个答案:

答案 0 :(得分:1)

HtmlDocument doc= new HtmlDocument();

doc.Load(@"C:\..\page.html");

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a");

foreach (HtmlNode node in nodes)
{
    if(node.OuterHtml.Contains("href"))
        Console.WriteLine(node.Attributes["href"].Value);
}

编辑:我现在编辑了我的代码它会起作用。您的问题是某些 a 标记没有 href 属性

答案 1 :(得分:0)

试试这段代码:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;

htmlDoc.Load(@"C:\..\page.html");

HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//a[@href]").Where(t => t.Attributes["href"].Value.StartsWith("#"));
// u can remove LINQ extension method Where() if u want to include # links too


foreach (HtmlNode hptext in nodes)
{
    string desc = hptext.Attributes["href"].Value;
    Console.WriteLine(desc);
}