StreamWriter xml忽略使用C#的子节点的节点内容

时间:2014-06-17 13:09:04

标签: c# xml rss

我正在尝试编写一个读取rss新闻源的程序,并在txt文件上重写文章的日期,标题和正文。我两天前刚刚学过C#,但有其他语言的经验。 该程序适用于某些订阅源,但在其他订阅源(例如路透社)中,有一封"通过电子邮件发送此文章"每个文章正文后键入链接,我无法复制它时摆脱它。我为整个Feed运行程序。

例如,这是某些新闻的xml代码:

<item>
  <title>Pimco's Ivascyn sees 'significant' opportunity in European bank assets</title>
  <link>http://feeds.reuters.com/~r/news/wealth/~3/vUJ74S5mXQg/story01.htm</link>
  <category domain="">PersonalFinance</category>
  <pubDate>Mon, 16 Jun 2014 15:37:52 GMT</pubDate>
  <guid isPermaLink="false">http://www.reuters.com/article/2014/06/16/us-investing-pimco-ivascyn-idUSKBN0ER1VV20140616?feedType=RSS&amp;feedName=PersonalFinance</guid>
  <description>NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...&lt;div class="feedflare"&gt;
  &lt;a href="http://feeds.reuters.com/~ff/news/wealth?a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"&gt;&lt;img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"&gt;&lt;/img&gt;&lt;/a&gt;
  &lt;/div&gt;&lt;img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/&gt;</description
  <feedburner:origLink>http://reuters.us.feedsportal.com/c/35217/f/654211/s/3b8e7c6b/sc/2/l/0L0Sreuters0N0Carticle0C20A140C0A60C160Cus0Einvesting0Epimco0Eivascyn0EidUSKBN0AER1VV20A140A6160DfeedType0FRSS0GfeedName0FPersonalFinance/story01.htm</feedburner:origLink>
</item>

然而,当我运行程序时,我得到:

Mon, 16 Jun 2014 15:37:52 GMT
Pimco's Ivascyn sees 'significant' opportunity in European bank assets
NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/>
**********

我试图摆脱文章正文后面的最后两行代码。我添加了星号以分隔不同的文章。

这是我的代码:

using System;
using System.IO;
using System.Text;
using System.Xml;

namespace XmlReading
{
    class RssReading
    {
        static void Main(string[] args)
        {
            //Creater a StreamWriter object to write in a text file.
            StreamWriter sw = new StreamWriter("C:\\Users\Testing002.txt");

            XmlDocument xmlDoc = new XmlDocument();

            // Loads the rss feed page
            xmlDoc.Load("http://feeds.reuters.com/news/wealth");

            //create an object of item nodes.
            XmlNodeList itemNodes = xmlDoc.SelectNodes("//rss/channel/item");

            foreach (XmlNode itemNode in itemNodes)
            {
                //Reading the title
                XmlNode titleNode = itemNode.SelectSingleNode("title");
                //Reading the date
                XmlNode dateNode = itemNode.SelectSingleNode("pubDate");
                //Reading the body 
                XmlNode bodyNode = itemNode.SelectSingleNode("description");

                if(((titleNode != null) && (dateNode != null)) && (bodyNode!= null))
                {
                  /*    Xpath of article body, and of extra links.
                   *    //*[@id="bodyblock"]/ul/li[2]/div/text()
                   *    //*[@id="bodyblock"]/ul/li[2]/div/div
                   */
                //writing to console just to check the output.
                    Console.WriteLine(dateNode.InnerText);
                    sw.WriteLine(dateNode.InnerText);

                    Console.WriteLine(titleNode.InnerText);
                    sw.WriteLine(titleNode.InnerText);

                    Console.WriteLine(bodyNode.Value);
                    sw.WriteLine(bodyNode.InnerText);

                    Console.WriteLine("**********\n\n\n");
                    sw.WriteLine("**********\n\n\n");
                    sw.WriteLine(" ");
                    sw.WriteLine(" ");

                }
            }
            sw.Close();
            Console.ReadKey(true);
        }
    }
}

提前感谢您的任何帮助或建议。

1 个答案:

答案 0 :(得分:0)

我找到了解决问题的方法。最初我认为这是一个孩子的问题,但我意识到&#34;通过电子邮件发送这个&#34;链接是使用实体创建的(例如:

&lt; 

&gt;

因此,我所做的就是使用从位置0到第一个&#39;&amp;&#39;的索引的子串。字符。另外为了使代码运行,即使rss读者没有遇到这个问题,我也使用Math.Max编写它以避免子串的负大小。

最终代码与将正文写入文本文件的行中的部分保持相同。代码将替换为以下行:

sw.WriteLine(bodyNode.InnerText.Substring(0,Math.Max(bodyNode.InnerXml.IndexOf("&"),0)));

此外,代码中不需要Console.WriteLine()。