我创建了一个程序,可以将链接(从网页)下载到htm文件中。我希望做的是测试htm文件中的每个链接并输出任何已损坏的链接。不幸的是,并非所有下载的链接都以“http://”开头,因此我尝试使用if语句来避免此问题。如何读取所有链接到数组,然后使用异步Web请求和响应循环遍历该数组。
private async void button4_Click(object sender, EventArgs e)
{
string text = System.IO.File.ReadAllText(@"C:\\Users\\Conal_Curran\\OneDrive\\C#\\MyProjects\\Web Crawler\\URLTester\\OP.htm");
List<string> stringlist = new List<string>();
stringlist.Add(text);
if (!text.StartsWith("http://"))
{
foreach (string line in stringlist)
{
var request = WebRequest.Create(text);
var response = (HttpWebResponse)await Task.Factory
.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);
Debug.Assert(response.StatusCode == HttpStatusCode.OK);
if (response == null)
{
BrokenLinks.Text = text;
}
else
{
BrokenLinks.Text = "All URLS Are OK";
}
}
}
正则表达式解析html文件:
string text = System.IO.File.ReadAllText(@"C:\\Users\\Conal_Curran\\OneDrive\\C#\\MyProjects\\Web Crawler\\URLTester\\OP.htm");
string regex = "href=\"(.*)\"";
Match match = Regex.Match(text, regex);
if (match.Success)
{
string link = match.Groups[1].Value;
Console.WriteLine(link);
MessageBox.Show("Going over URLS now Please stand by.");
var request = WebRequest.Create(link);
var response = (HttpWebResponse)await Task.Factory
.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);
Debug.Assert(response.StatusCode == HttpStatusCode.OK);
if (response == null)
{
BrokenLinks.Text = text;
label2.ForeColor = System.Drawing.Color.Red;
}
else
{
BrokenLinks.Text = "All URLS Are OK";
label2.ForeColor = System.Drawing.Color.Green;
}
}
答案 0 :(得分:0)
我认为这段代码应该让你以正确的方式。显然,只有当你正在阅读的文件是带有一个链接的txt文件时,这才有效。
var lines = File.ReadLines(fileName);//this reads the file one l
foreach (var line in lines){
if (text.StartsWith("http://")){
//execute your request, since it looks like a valid link
} else {
//in this the case url dosn't start with http:// if you want to check it just add http:// to the beginning of the string, otherwise don't do anything.
}
}
如果您想检查链接是否有效,请参阅this回答。 我希望这会对你有所帮助。