我是动漫迷,我想获得所有ANIME CHARACTERS的完整列表,所以我遇到过这个网站: http://www.animevice.com/characters/?page=1 我的目标是提取所有名称并将它们添加到listBox1。这是我目前的代码:
try
{
while (true)
{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.animevice.com/characters/?page=" + n);
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0";
req.KeepAlive = true;
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
Stream responseData = response.GetResponseStream();
StreamReader reader = new StreamReader(responseData);
string responseFromServer = reader.ReadToEnd();
string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
if (match.Success)
{
listBox1.Items.Add(match.Groups[2]Value.ToString());
}
if (listBox1.Items.Count % 50 == 0)
{
n++;
}
}
}
catch { }
然而,这给了我很多次名单上的第一个名字(Monkey D. Luffy)。 有解决方案吗 干杯
答案 0 :(得分:1)
我会使用像HtmlAgilityPack这样的真正的html解析器来解析html而不是正则表达式。
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(responseFromServer);
var names = doc.DocumentNode.SelectNodes("//a[@class='name']")
.Select(a=>a.InnerText)
.ToList();
listBox1.DataSource = names;
答案 1 :(得分:0)
您只为页面读取一个名称。
相反:
Match match = Regex.Match(responseFromServer, m, RegexOptions.IgnoreCase);
if (match.Success)
{
listBox1.Items.Add(match.Groups[2]Value.ToString());
}
if (listBox1.Items.Count % 50 == 0)
{
n++;
}
试试这个:
var matches = Regex.Matches(responseFromServer, m, RegexOptions.IgnoreCase);
foreach (var item in matches)
{
var match = item as Match;
if (match.Success)
{
listBox1.Items.Add(match.Groups[2]Value.ToString());
}
if (list.Count % 50 == 0)
{
n++;
}
}
答案 2 :(得分:0)
using (StreamReader reader = new StreamReader(responseData))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string m = "<a href=\"(.*)\" class=\"name\">(.*)</a>";
Match match = Regex.Match(line, m, RegexOptions.IgnoreCase);
if (match.Success)
{
listBox1.Items.Add(match.Groups[2].Value.ToString());
}
}
}