Question

我正在使用SyndicationFeed类来为文章使用一些rss feed。我想知道如何只从项目的摘要字段中获取文本，而不使用html标记。例如，有时（并非总是）它包含html标签，例如：div，img，h，p tags：/ a＆gt; / div＆gt; ，img src =＆＃39; http＆＃34;

我想摆脱所有标签。另外，我不确定它是否会在RSS源中提供完整的描述。

我应该使用正则表达式吗？其他方法？

XmlReader reader = XmlReader.Create(response.GetResponseStream());

SyndicationFeed feed = SyndicationFeed.Load(reader);

foreach (SyndicationItem item in feed.Items)
{

     string description= item.Summary;  //This contains tags and not only the article text

}

Answer 1

是的，我认为正则表达式是实现此目的最简单的内置方式...

// Get rid of the tags
description = Regex.Replace(description, @"<.+?>", String.Empty);

// Then decode the HTML entities
description = WebUtility.HtmlDecode(description);

SyndicationFeed - 项目摘要（RSS描述） - 仅从中提取文本

1 个答案: