我正在尝试操作我使用WebClient
下载的HTML文件。我使用Regex
来提取href
属性的值。这是我的代码:
string html = "";
WebClient webClient = new WebClient();
html = webClient.DownloadString(addressTextBox.Text).Replace("\n", "").Replace("\t", "");
address = webClient.BaseAddress;
StringBuilder stringBuilder = new StringBuilder(html);
MatchCollection matchCollection = Regex.Matches(html, @"(?<=\bhref="")[^""]*");
int offset = 0;
foreach (Match match in matchCollection)
{
string newValue = addressTextBox.Text + match.Value.Replace("./", "").Replace("../", "");
int tempOffset = match.Index - offset;
stringBuilder.Remove(tempOffset, match.Length);
stringBuilder.Insert(tempOffset, newValue);
offset = newValue.Length - match.Length;
}
webBrowser.DocumentText = stringBuilder.ToString();
File.WriteAllText(@"C:\Users\Admin\Documents\site.xml", stringBuilder.ToString(), Encoding.UTF8);
以下是我要做的事情: