Question

我正在尝试操作我使用WebClient下载的HTML文件。我使用Regex来提取href属性的值。这是我的代码：

string html = "";
WebClient webClient = new WebClient();

html = webClient.DownloadString(addressTextBox.Text).Replace("\n", "").Replace("\t", "");
 address = webClient.BaseAddress;

StringBuilder stringBuilder = new StringBuilder(html);
MatchCollection matchCollection = Regex.Matches(html, @"(?<=\bhref="")[^""]*");

int offset = 0;

foreach (Match match in matchCollection)
{
    string newValue = addressTextBox.Text + match.Value.Replace("./", "").Replace("../", "");
    int tempOffset = match.Index - offset;

    stringBuilder.Remove(tempOffset, match.Length);
    stringBuilder.Insert(tempOffset, newValue);
    offset = newValue.Length - match.Length;
}

webBrowser.DocumentText = stringBuilder.ToString();
File.WriteAllText(@"C:\Users\Admin\Documents\site.xml", stringBuilder.ToString(), Encoding.UTF8);

以下是我要做的事情：

我正在尝试获取href属性值
我正在尝试删除属性的值
我正在插入一个新值来替换旧值
由于新属性值通常大于旧属性值，因此我创建了一个offset变量来存储先前属性值的长度与新值之间的差异。然后，我从下一场比赛的索引中减去偏移量

下面是我尝试操作网页后发生的损坏的屏幕截图：

我做错了什么？如何正确替换每个href属性的值？

操作HTML文件会导致字符串索引不正确

0 个答案: