所以我需要从网站下载内容并将其放在richTextBox上。问题是,当我下载内容并使用正则表达式过滤它时,会出现损坏的文本。我怎么能解决它。以下是我的代码:
String website = "https://www.basketnews.lt/news-102294-nba-klubu-vadovai-finalas-nesikeis-mvp-iskovos-jamesas.html";
MyWebClient webClientObj = new MyWebClient();
webClientObj.Encoding = System.Text.Encoding.UTF8;
String data = webClientObj.DownloadString(website);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(data);
foreach (HtmlAgilityPack.HtmlNode node2 in doc.DocumentNode.SelectNodes("//div[@class= 'text']//p"))
{
string content = node2.InnerText;
this.richTextBox1.AppendText('\t' + content + '\n');
}
我希望它看起来像:
目前它看起来像这样:
答案 0 :(得分:2)
该文本包含html编码的部分。通过HtmlDecode
:
var content = System.Web.HttpUtility.HtmlDecode(node2.innerText);