在asp.net中使用html agility获取div的内容以及完整的HTML& CSS

时间:2012-11-01 06:30:03

标签: c# asp.net xpath html-agility-pack

我需要抓网站&获取网球抽奖我试图用HTML Agility包来实现它,但到目前为止我没有任何成功。

抓取网址http://www.itftennis.com/procircuit/tournaments/women%27s-tournament/info.aspx?tournamentid=1100027528

的示例链接

下面是.cs&的代码。我需要抓取的HTML和在我的网站上显示。

我还需要标题,页面描述,但描述总是返回null

    HtmlDocument doc = new HtmlDocument();
    var url = txtURL.Text;

    var webGet = new HtmlWeb();
     doc = webGet.Load(url);

  //  doc.LoadHtml(response);

    String title = (from x in doc.DocumentNode.Descendants()
                    where x.Name.ToLower() == "title"
                    select x.InnerText).FirstOrDefault();

    String desc = (from x in doc.DocumentNode.Descendants()
                   where x.Name.ToLower() == "description"
                   select x.InnerText).FirstOrDefault();

    List<String> imgs = (from x in doc.DocumentNode.Descendants()
                         where x.Name.ToLower() == "img"
                         select x.Attributes["src"].Value).ToList<String>();


    //string drawsheet = (from x in doc.DocumentNode.InnerHtml where x.


    lblTitle.Text = title;
    lblDescription.Text = desc;


    System.Text.StringBuilder sb = new System.Text.StringBuilder();

    foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='divTourDrawsheets']"))
    {
        string text = node.InnerText; //that's the text you are looking for
    }

    sb.Append(text);
    ltrDrawSheet.Text = sb.ToString();

* 部分HTML代码,因为我必须删除大部分内容,因为它超过30000个字符*

<div style="overflow:hidden;" id="divTourDrawsheets">
<title></title>

<style type="text/css">
        #divDrawsheet {font-size:0.9em; overflow:auto; margin-left:10px; cursor:move; /*width:1500px;*/ width:2000px;} /*Width set at 1000px for IE7*/
        .divWinner1S {margin-top:28px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .divWinner1D {margin-top:50px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .liQWRnd1S {margin-top:28px; margin-bottom:-11px;}
        .liQWRnd1D {margin-top:30px; height:47px;}

        .tDetail .liRnd2S {padding:22px 0px 0px 0px;}
        .divRnd2S {border-right:1px solid #999; text-align:center; padding:4px 0px 6px 0px; height:18px;}
        .divWinner2S {margin-top:63px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .liQWRnd2S {margin-top:53px; margin-bottom:73px;}

        .tDetail .liRnd2D {padding:30px 0px 6px 0px;}
        .divRnd2D {border-right:1px solid #999; text-align:center; padding:12px 0px 10px 0px; height:18px;}
        .divWinner2D {margin-top:85px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd3S {padding:49px 0px 27px 0px;}
        .divRnd3S {border-right:1px solid #999; text-align:center; padding:39px 0px 29px 0px; height:18px;}
        .divWinner3S {margin-top:114px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .liQWRnd3S {margin-top:112px; margin-bottom:185px;}

        .tDetail .liRnd3D {padding:72px 0px 46px 0px;}
        .divRnd3D {border-right:1px solid #999; text-align:center; padding:52px 0px 48px 0px; height:18px;}
        .divWinner3D {margin-top:165px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd4S {padding:107px 0px 83px 0px;}
        .divRnd4S {border-right:1px solid #999; text-align:center; padding:90px 0px 88px 0px; height:18px;}
        .divWinner4S {margin-top:225px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .liQWRnd4S {margin-top:220px; margin-bottom:410px;}

        .tDetail .liRnd4D {padding:154px 0px 122px 0px;}
        .divRnd4D {border-right:1px solid #999; text-align:center; padding:134px 0px 130px 0px; height:18px;}
        .divWinner4D {margin-top:325px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd5S {padding:220px 0px 196px 0px;}
        .divRnd5S {border-right:1px solid #999; text-align:center; padding:200px 0px 200px 0px; height:18px;}
        .divWinner5S {margin-top:445px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd5D {padding:316px 0px 250px 0px;}
        .divRnd5D {border-right:1px solid #999; text-align:center; padding:295px 0px 295px 0px; height:18px;}
        .divWinner5D {margin-top:645px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd6S {padding:443px 0px 352px 0px;}
        .divRnd6S {border-right:1px solid #999; text-align:center; padding:426px 0px 425px 0px; height:18px;}
        .divWinner6S {margin-top:895px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd6D {padding:630px 0px 300px 0px;}
        .divRnd6D {border-right:1px solid #999; text-align:center; padding:620px 0px 610px 0px; height:18px;}
        .divWinner6D {margin-top:1285px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}

        .tDetail .liRnd7S {padding-top:890px;}
        .divRnd7S {border-right:1px solid #999; text-align:center; padding:880px 0px 800px 0px; height:18px;}
        .divWinner7S {margin-top:1795px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
        .RRTable {background-color:#fff;}
        .RRTable td:empty{background-color:#ddd;}
    </style>



<span id="spnNote">(Use the cursor to move the drawsheet)</span><span style="float:right; padding:10px;" id="spnPrintDS"><a style="cursor:pointer;" onclick="printableDrawsheet();"><img align="absbottom" style="border-width:0px;" alt="Print Drawsheet" src="/itf/images/printDS-icon.png" id="imgPrint"></a></span>

<div id="divDrawsheet" style="position: relative;" class="ui-draggable">

    <div class="fl"><ul id="ulRounds">
        <li class="fl" id="liRound">
            <div style="padding:10px 3px 10px 3px; min-width:125px; text-align:center;font-size:1.2em;" id="divRound"><strong>Round 1</strong></div>
            <ul id="ulEntry">
                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14CZE" title="Czech Republic" alt="Czech Republic" src="/ITF/Images/pixel.gif">
<a class="drsh100057386" href="/procircuit/players/player/profile.aspx?playerid=100057386">Katerina VANKOVA</a></span> (CZE)&nbsp;[1]
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">
                            <span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>

                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100149370" href="/procircuit/players/player/profile.aspx?playerid=100149370">Kyria DUNFORD</a></span> (GBR)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100149370&amp;player2=100073050" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100073050" href="/procircuit/players/player/profile.aspx?playerid=100073050" style="background-color:transparent;">Hollie BEES</a></span> (GBR)
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100141485" href="/procircuit/players/player/profile.aspx?playerid=100141485">Sophie WATTS</a></span> (GBR)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100141485&amp;player2=100084615" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14CZE" title="Czech Republic" alt="Czech Republic" src="/ITF/Images/pixel.gif">
<a class="drsh100084615" href="/procircuit/players/player/profile.aspx?playerid=100084615" style="background-color:transparent;">Martina PRADOVA</a></span> (CZE)
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">
                            <span class="flagLeft" id="spnPlayerTop1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>

                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14BLR" title="Belarus" alt="Belarus" src="/ITF/Images/pixel.gif">
<a class="drsh100128240" href="/procircuit/players/player/profile.aspx?playerid=100128240" style="background-color:transparent;">Aliaksandra SASNOVICH</a></span> (BLR)&nbsp;[7]
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14RUS" title="Russia" alt="Russia" src="/ITF/Images/pixel.gif">
<a class="drsh100134744" href="/procircuit/players/player/profile.aspx?playerid=100134744" style="background-color:transparent;">Mayya KATSITADZE</a></span> (RUS)&nbsp;[2]
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">
                            <span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>

                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14FIN" title="Finland" alt="Finland" src="/ITF/Images/pixel.gif">
<a class="drsh100152949" href="/procircuit/players/player/profile.aspx?playerid=100152949">Mia Nicole EKLUND</a></span> (FIN)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100152949&amp;player2=100151646" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100151646" href="/procircuit/players/player/profile.aspx?playerid=100151646">Lauren MCMINN</a></span> (GBR)
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100139057" href="/procircuit/players/player/profile.aspx?playerid=100139057">Jazzamay DREW</a></span> (GBR)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100139057&amp;player2=100143216" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100143216" href="/procircuit/players/player/profile.aspx?playerid=100143216">Brigit FOLLAND</a></span> (GBR)
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">
                            <span class="flagLeft" id="spnPlayerTop1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>

                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14IRL" title="Ireland" alt="Ireland" src="/ITF/Images/pixel.gif">
<a class="drsh100119788" href="/procircuit/players/player/profile.aspx?playerid=100119788">Amy BOWTELL</a></span> (IRL)&nbsp;[5]
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14FRA" title="France" alt="France" src="/ITF/Images/pixel.gif">
<a class="drsh100090878" href="/procircuit/players/player/profile.aspx?playerid=100090878">Constance SIBILLE</a></span> (FRA)&nbsp;[3]
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">
                            <span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>

                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100141443" href="/procircuit/players/player/profile.aspx?playerid=100141443">Anneka WATTS</a></span> (GBR)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100141443&amp;player2=100053033" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14USA" title="USA" alt="USA" src="/ITF/Images/pixel.gif">
<a class="drsh100053033" href="/procircuit/players/player/profile.aspx?playerid=100053033">Tori KINARD</a></span> (USA)
                        </span>

                    </div>
                </li>

                <li style="padding:4px 0px 0px 0px;" id="liEntry">
                    <div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
                        <span style="display:block; height:12px;">



<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100139660" href="/procircuit/players/player/profile.aspx?playerid=100139660">Edita RACA</a></span> (GBR)
                        </span>

                    </div>
                    <div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100139660&amp;player2=100131722" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
                    <div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
                        <span style="display:block; height:12px;">




    </div>

</div>

</div>

1 个答案:

答案 0 :(得分:1)

我认为“描述”是指元标记。

获得此功能的最佳方法可能是使用像这样的xpath表达式

HtmlNode descNode = doc.DocumentNode.SelectSingleNode("//meta[@name='description']");