我有内容:来自rss的编码文本如下:
<content:encoded><![CDATA[<P><B>Wednesday, September 26, 2012</B></P>It is Apple.<P>Shops are closed.<br />Parking is not allowed here. Go left and park.<br />All theatres are opened.<br /></P><P><B>Thursday, September 27, 2012</B></P><P>Shops are open.<br />Parking is not allowed here. Go left and park.<br />All theatres are opened.<br /></P>]]></content:encoded>
使用以下方法,我可以从HTML中提取文本:
public static string StripHTML(this string htmlText)
{
var reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
return HttpUtility.HtmlDecode(reg.Replace(htmlText, string.Empty));
}
但我希望<b></b>
中的文本插入dateArray []和<p></p>
中的文本以插入descriptionArray [],以便我可以显示如下:
THANKS iNADANCE。
答案 0 :(得分:0)
//http://htmlagilitypack.codeplex.com/
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var result = doc.DocumentNode.Descendants()
.Where(n => n is HtmlAgilityPack.HtmlTextNode)
.Select(n=>new {
IsDate = n.ParentNode.Name=="b" ? true: false,
Text = n.InnerText,
})
.ToList();