我需要抓网站&获取网球抽奖我试图用HTML Agility包来实现它,但到目前为止我没有任何成功。
的示例链接下面是.cs&的代码。我需要抓取的HTML和在我的网站上显示。
我还需要标题,页面描述,但描述总是返回null
HtmlDocument doc = new HtmlDocument();
var url = txtURL.Text;
var webGet = new HtmlWeb();
doc = webGet.Load(url);
// doc.LoadHtml(response);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "description"
select x.InnerText).FirstOrDefault();
List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
select x.Attributes["src"].Value).ToList<String>();
//string drawsheet = (from x in doc.DocumentNode.InnerHtml where x.
lblTitle.Text = title;
lblDescription.Text = desc;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@id='divTourDrawsheets']"))
{
string text = node.InnerText; //that's the text you are looking for
}
sb.Append(text);
ltrDrawSheet.Text = sb.ToString();
* 部分HTML代码,因为我必须删除大部分内容,因为它超过30000个字符*
<div style="overflow:hidden;" id="divTourDrawsheets">
<title></title>
<style type="text/css">
#divDrawsheet {font-size:0.9em; overflow:auto; margin-left:10px; cursor:move; /*width:1500px;*/ width:2000px;} /*Width set at 1000px for IE7*/
.divWinner1S {margin-top:28px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.divWinner1D {margin-top:50px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.liQWRnd1S {margin-top:28px; margin-bottom:-11px;}
.liQWRnd1D {margin-top:30px; height:47px;}
.tDetail .liRnd2S {padding:22px 0px 0px 0px;}
.divRnd2S {border-right:1px solid #999; text-align:center; padding:4px 0px 6px 0px; height:18px;}
.divWinner2S {margin-top:63px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.liQWRnd2S {margin-top:53px; margin-bottom:73px;}
.tDetail .liRnd2D {padding:30px 0px 6px 0px;}
.divRnd2D {border-right:1px solid #999; text-align:center; padding:12px 0px 10px 0px; height:18px;}
.divWinner2D {margin-top:85px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd3S {padding:49px 0px 27px 0px;}
.divRnd3S {border-right:1px solid #999; text-align:center; padding:39px 0px 29px 0px; height:18px;}
.divWinner3S {margin-top:114px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.liQWRnd3S {margin-top:112px; margin-bottom:185px;}
.tDetail .liRnd3D {padding:72px 0px 46px 0px;}
.divRnd3D {border-right:1px solid #999; text-align:center; padding:52px 0px 48px 0px; height:18px;}
.divWinner3D {margin-top:165px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd4S {padding:107px 0px 83px 0px;}
.divRnd4S {border-right:1px solid #999; text-align:center; padding:90px 0px 88px 0px; height:18px;}
.divWinner4S {margin-top:225px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.liQWRnd4S {margin-top:220px; margin-bottom:410px;}
.tDetail .liRnd4D {padding:154px 0px 122px 0px;}
.divRnd4D {border-right:1px solid #999; text-align:center; padding:134px 0px 130px 0px; height:18px;}
.divWinner4D {margin-top:325px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd5S {padding:220px 0px 196px 0px;}
.divRnd5S {border-right:1px solid #999; text-align:center; padding:200px 0px 200px 0px; height:18px;}
.divWinner5S {margin-top:445px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd5D {padding:316px 0px 250px 0px;}
.divRnd5D {border-right:1px solid #999; text-align:center; padding:295px 0px 295px 0px; height:18px;}
.divWinner5D {margin-top:645px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd6S {padding:443px 0px 352px 0px;}
.divRnd6S {border-right:1px solid #999; text-align:center; padding:426px 0px 425px 0px; height:18px;}
.divWinner6S {margin-top:895px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd6D {padding:630px 0px 300px 0px;}
.divRnd6D {border-right:1px solid #999; text-align:center; padding:620px 0px 610px 0px; height:18px;}
.divWinner6D {margin-top:1285px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.tDetail .liRnd7S {padding-top:890px;}
.divRnd7S {border-right:1px solid #999; text-align:center; padding:880px 0px 800px 0px; height:18px;}
.divWinner7S {margin-top:1795px; border-bottom:1px solid #999; padding:0px 3px 2px 3px;}
.RRTable {background-color:#fff;}
.RRTable td:empty{background-color:#ddd;}
</style>
<span id="spnNote">(Use the cursor to move the drawsheet)</span><span style="float:right; padding:10px;" id="spnPrintDS"><a style="cursor:pointer;" onclick="printableDrawsheet();"><img align="absbottom" style="border-width:0px;" alt="Print Drawsheet" src="/itf/images/printDS-icon.png" id="imgPrint"></a></span>
<div id="divDrawsheet" style="position: relative;" class="ui-draggable">
<div class="fl"><ul id="ulRounds">
<li class="fl" id="liRound">
<div style="padding:10px 3px 10px 3px; min-width:125px; text-align:center;font-size:1.2em;" id="divRound"><strong>Round 1</strong></div>
<ul id="ulEntry">
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14CZE" title="Czech Republic" alt="Czech Republic" src="/ITF/Images/pixel.gif">
<a class="drsh100057386" href="/procircuit/players/player/profile.aspx?playerid=100057386">Katerina VANKOVA</a></span> (CZE) [1]
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100149370" href="/procircuit/players/player/profile.aspx?playerid=100149370">Kyria DUNFORD</a></span> (GBR)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100149370&player2=100073050" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100073050" href="/procircuit/players/player/profile.aspx?playerid=100073050" style="background-color:transparent;">Hollie BEES</a></span> (GBR)
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100141485" href="/procircuit/players/player/profile.aspx?playerid=100141485">Sophie WATTS</a></span> (GBR)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100141485&player2=100084615" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14CZE" title="Czech Republic" alt="Czech Republic" src="/ITF/Images/pixel.gif">
<a class="drsh100084615" href="/procircuit/players/player/profile.aspx?playerid=100084615" style="background-color:transparent;">Martina PRADOVA</a></span> (CZE)
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft" id="spnPlayerTop1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14BLR" title="Belarus" alt="Belarus" src="/ITF/Images/pixel.gif">
<a class="drsh100128240" href="/procircuit/players/player/profile.aspx?playerid=100128240" style="background-color:transparent;">Aliaksandra SASNOVICH</a></span> (BLR) [7]
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14RUS" title="Russia" alt="Russia" src="/ITF/Images/pixel.gif">
<a class="drsh100134744" href="/procircuit/players/player/profile.aspx?playerid=100134744" style="background-color:transparent;">Mayya KATSITADZE</a></span> (RUS) [2]
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14FIN" title="Finland" alt="Finland" src="/ITF/Images/pixel.gif">
<a class="drsh100152949" href="/procircuit/players/player/profile.aspx?playerid=100152949">Mia Nicole EKLUND</a></span> (FIN)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100152949&player2=100151646" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100151646" href="/procircuit/players/player/profile.aspx?playerid=100151646">Lauren MCMINN</a></span> (GBR)
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100139057" href="/procircuit/players/player/profile.aspx?playerid=100139057">Jazzamay DREW</a></span> (GBR)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100139057&player2=100143216" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100143216" href="/procircuit/players/player/profile.aspx?playerid=100143216">Brigit FOLLAND</a></span> (GBR)
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft" id="spnPlayerTop1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14IRL" title="Ireland" alt="Ireland" src="/ITF/Images/pixel.gif">
<a class="drsh100119788" href="/procircuit/players/player/profile.aspx?playerid=100119788">Amy BOWTELL</a></span> (IRL) [5]
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14FRA" title="France" alt="France" src="/ITF/Images/pixel.gif">
<a class="drsh100090878" href="/procircuit/players/player/profile.aspx?playerid=100090878">Constance SIBILLE</a></span> (FRA) [3]
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a disabled="disabled" id="lnkHeadToHead"></a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft" id="spnPlayerBottom1Bye"><img height="11px" width="14px" border="0" src="/itf/images/pixel.gif">BYE</span>
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100141443" href="/procircuit/players/player/profile.aspx?playerid=100141443">Anneka WATTS</a></span> (GBR)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100141443&player2=100053033" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14USA" title="USA" alt="USA" src="/ITF/Images/pixel.gif">
<a class="drsh100053033" href="/procircuit/players/player/profile.aspx?playerid=100053033">Tori KINARD</a></span> (USA)
</span>
</div>
</li>
<li style="padding:4px 0px 0px 0px;" id="liEntry">
<div class="hlPlayer" style="border-bottom:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerTop">
<span style="display:block; height:12px;">
<span class="flagLeft">
<img class="flag14 f14GBR" title="Great Britain" alt="Great Britain" src="/ITF/Images/pixel.gif">
<a class="drsh100139660" href="/procircuit/players/player/profile.aspx?playerid=100139660">Edita RACA</a></span> (GBR)
</span>
</div>
<div style="border-right:1px solid #999; text-align:center; padding:6px 0px 1px 0px; height:15px;"><i><a href="/procircuit/players/head-to-head/result.aspx?player1=100139660&player2=100131722" id="lnkHeadToHead">H2H</a><span style="color:#fff;">|</span></i></div>
<div class="hlPlayer" style="border-bottom:1px solid #999; border-right:1px solid #999; padding:0px 3px 2px 3px;" id="divPlayerBottom">
<span style="display:block; height:12px;">
</div>
</div>
</div>
答案 0 :(得分:1)
我认为“描述”是指元标记。
获得此功能的最佳方法可能是使用像这样的xpath表达式
HtmlNode descNode = doc.DocumentNode.SelectSingleNode("//meta[@name='description']");