C# - XmlNodeList - 在没有HTML的描述标记之间获取内部xml /文本

时间:2011-01-10 19:24:18

标签: c# xml rss

现在我有一个列表框,显示RSS源的RSS文章标题/网址。标题和URL提取没有问题,但现在我想在列表框中选择文章标题时让描述显示在富文本框中。我可以成功地将描述显示在文本框中,但它总是跟着一堆额外的html。例如:

There's a silly rumor exploding on the Internet this weekend, alleging that Facebook is shutting down on March 15 because CEO Mark Zuckerberg "wants his old life back," and desires to "put an end to all the madness."<div class="feedflare">
<a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=at7OdUE16Y0:jsXll_RkIzI:V_sGLiPBpWU" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=qj6IDK7rITs" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=at7OdUE16Y0:jsXll_RkIzI:gIN9vFwOqvQ" border="0"></img></a>

代码:

private void button1_Click(object sender, EventArgs e)
{

    {

        XmlTextReader rssReader = new XmlTextReader(txtUrl.Text);
        XmlDocument rssDoc = new XmlDocument();
        rssDoc.Load(rssReader);
        XmlNodeList titleList = rssDoc.GetElementsByTagName("title");
        XmlNodeList urlList = rssDoc.GetElementsByTagName("link");
        descList = rssDoc.GetElementsByTagName("description");


        for (int i = 0; i < titleList.Count; i++)
        {
            lvi = rowNews.Items.Add(titleList[i].InnerXml);
            lvi.SubItems.Add(urlList[i].InnerXml);
        }

    }

}

private void rowNews_SelectedIndexChanged(object sender, EventArgs e)
{
    if (rowNews.SelectedIndices.Count <= 0)
    {
        return;
    }
    int intselectedindex = rowNews.SelectedIndices[0]; // Get index of article title

    txtDesc.Text=(descList[intselectedindex].InnerText); 
    // Get description array index that matched list index 

}

2 个答案:

答案 0 :(得分:2)

您可以使用Using C# regular expressions to remove HTML tags

中的方法去除html

答案 1 :(得分:0)

您可以使用InnerText代替InnerHtml。这将只获取子节点的内容而不进行任何标记。