C#使用HttpWebRequests提取名称

时间:2012-11-10 23:20:45

标签: c# web response extract names

我是动漫迷,我想获得所有ANIME CHARACTERS的完整列表,所以我遇到过这个网站: http://www.animevice.com/characters/?page=1 我的目标是提取所有名称并将它们添加到listBox1。这是我目前的代码:

        try
        {
        while (true)
        {
            HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.animevice.com/characters/?page=" + n);
            req.Method = "GET";
            req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0";
            req.KeepAlive = true;

            HttpWebResponse response = (HttpWebResponse)req.GetResponse();
            Stream responseData = response.GetResponseStream();
            StreamReader reader = new StreamReader(responseData);
            string responseFromServer = reader.ReadToEnd();
            string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
            Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
            if (match.Success)
            {
                listBox1.Items.Add(match.Groups[2]Value.ToString());

            }
            if (listBox1.Items.Count % 50 == 0)
            {
                n++;
            }
        }
}
catch { }

然而,这给了我很多次名单上的第一个名字(Monkey D. Luffy)。 有解决方案吗 干杯

3 个答案:

答案 0 :(得分:1)

我会使用像HtmlAgilityPack这样的真正的html解析器来解析html而不是正则表达式。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(responseFromServer);
var names = doc.DocumentNode.SelectNodes("//a[@class='name']")
                .Select(a=>a.InnerText)
                .ToList();

listBox1.DataSource = names;

答案 1 :(得分:0)

您只为页面读取一个名称。

相反:

Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
if (match.Success)
{
    listBox1.Items.Add(match.Groups[2]Value.ToString());

}
if (listBox1.Items.Count % 50 == 0)
{
    n++;
}

试试这个:

var matches = Regex.Matches(responseFromServer, m, RegexOptions.IgnoreCase);
foreach (var item in matches)
{
    var match = item as Match;
    if (match.Success)
    {
        listBox1.Items.Add(match.Groups[2]Value.ToString());    
    }
    if (list.Count % 50 == 0)
    {
        n++;
    }
}

答案 2 :(得分:0)

using (StreamReader reader = new StreamReader(responseData))
  {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
             string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
             Match match = Regex.Match(line, m, RegexOptions.IgnoreCase);
             if (match.Success)
             {
                 listBox1.Items.Add(match.Groups[2].Value.ToString());
             }
         }
  }