Question

下面是我试图从以下内容中提取信息的Html块的示例部分：

<a href="https://secure.tibia.com/community/?subtopic=characters&name=Alemao+Golpista" >Alemao&#160;Golpista</a></td><td style="width:10%;" >51</td><td style="width:20%;" >Knight</td></tr><tr class="Even" style="text-align:right;" ><td style="width:70%;text-align:left;" >

我基本上抓住整个Html，这是一个在线玩家列表，并尝试将它们附加到列表中：名称（Alemao Golpista），等级（51）和职业（骑士）。

使用正则表达式是一个痛苦的屁股和相当慢的我将如何使用敏捷包？

Answer 1

不要使用正则表达式来解析html文件。正如已经说过的那样，你应该使用你能找到的任何HtmlagilityPack例子，即使它们在他们的网站上很少。并且文档不容易找到。

为了让您从这里开始，您将如何加载HtmlDocument并获取锚标记＆＃39; href属性。

HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

try{
    var temp = new Uri(url.Url);
    var request = (HttpWebRequest)WebRequest.Create(temp);
    request.Method = "GET";
    using (var response = (HttpWebResponse)request.GetResponse())
    {
        using (var stream = response.GetResponseStream())
        {
            htmlDoc.Load(stream, Encoding.GetEncoding("iso-8859-9"));
        }
    }
}catch(WebException ex){
    Console.WriteLine(ex.Message);
 }

HtmlNodeCollection c = htmlDoc.DocumentNode.SelectNodes("//a");

List<string> urls = new List<string>();
foreach(HtmlNode n in c){
    urls.Add(n.GetAttributeValue("href", ""));
}

上面的代码可以获取字符串数组中网页的所有链接。你应该研究xpath。你还应该得到documentation的HAP并阅读它。我无法在任何地方找到文档，因此我将已经存在的文档上传到计算机上。

如何使用Html Agility Pack挑选出具体的文字？

1 个答案: