我正在尝试编写一个读取rss新闻源的程序,并在txt文件上重写文章的日期,标题和正文。我两天前刚刚学过C#,但有其他语言的经验。 该程序适用于某些订阅源,但在其他订阅源(例如路透社)中,有一封"通过电子邮件发送此文章"每个文章正文后键入链接,我无法复制它时摆脱它。我为整个Feed运行程序。
例如,这是某些新闻的xml代码:
<item>
<title>Pimco's Ivascyn sees 'significant' opportunity in European bank assets</title>
<link>http://feeds.reuters.com/~r/news/wealth/~3/vUJ74S5mXQg/story01.htm</link>
<category domain="">PersonalFinance</category>
<pubDate>Mon, 16 Jun 2014 15:37:52 GMT</pubDate>
<guid isPermaLink="false">http://www.reuters.com/article/2014/06/16/us-investing-pimco-ivascyn-idUSKBN0ER1VV20140616?feedType=RSS&feedName=PersonalFinance</guid>
<description>NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth?a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/></description
<feedburner:origLink>http://reuters.us.feedsportal.com/c/35217/f/654211/s/3b8e7c6b/sc/2/l/0L0Sreuters0N0Carticle0C20A140C0A60C160Cus0Einvesting0Epimco0Eivascyn0EidUSKBN0AER1VV20A140A6160DfeedType0FRSS0GfeedName0FPersonalFinance/story01.htm</feedburner:origLink>
</item>
然而,当我运行程序时,我得到:
Mon, 16 Jun 2014 15:37:52 GMT
Pimco's Ivascyn sees 'significant' opportunity in European bank assets
NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/>
**********
我试图摆脱文章正文后面的最后两行代码。我添加了星号以分隔不同的文章。
这是我的代码:
using System;
using System.IO;
using System.Text;
using System.Xml;
namespace XmlReading
{
class RssReading
{
static void Main(string[] args)
{
//Creater a StreamWriter object to write in a text file.
StreamWriter sw = new StreamWriter("C:\\Users\Testing002.txt");
XmlDocument xmlDoc = new XmlDocument();
// Loads the rss feed page
xmlDoc.Load("http://feeds.reuters.com/news/wealth");
//create an object of item nodes.
XmlNodeList itemNodes = xmlDoc.SelectNodes("//rss/channel/item");
foreach (XmlNode itemNode in itemNodes)
{
//Reading the title
XmlNode titleNode = itemNode.SelectSingleNode("title");
//Reading the date
XmlNode dateNode = itemNode.SelectSingleNode("pubDate");
//Reading the body
XmlNode bodyNode = itemNode.SelectSingleNode("description");
if(((titleNode != null) && (dateNode != null)) && (bodyNode!= null))
{
/* Xpath of article body, and of extra links.
* //*[@id="bodyblock"]/ul/li[2]/div/text()
* //*[@id="bodyblock"]/ul/li[2]/div/div
*/
//writing to console just to check the output.
Console.WriteLine(dateNode.InnerText);
sw.WriteLine(dateNode.InnerText);
Console.WriteLine(titleNode.InnerText);
sw.WriteLine(titleNode.InnerText);
Console.WriteLine(bodyNode.Value);
sw.WriteLine(bodyNode.InnerText);
Console.WriteLine("**********\n\n\n");
sw.WriteLine("**********\n\n\n");
sw.WriteLine(" ");
sw.WriteLine(" ");
}
}
sw.Close();
Console.ReadKey(true);
}
}
}
提前感谢您的任何帮助或建议。
答案 0 :(得分:0)
我找到了解决问题的方法。最初我认为这是一个孩子的问题,但我意识到&#34;通过电子邮件发送这个&#34;链接是使用实体创建的(例如:
<
和
>
因此,我所做的就是使用从位置0到第一个&#39;&amp;&#39;的索引的子串。字符。另外为了使代码运行,即使rss读者没有遇到这个问题,我也使用Math.Max编写它以避免子串的负大小。
最终代码与将正文写入文本文件的行中的部分保持相同。代码将替换为以下行:
sw.WriteLine(bodyNode.InnerText.Substring(0,Math.Max(bodyNode.InnerXml.IndexOf("&"),0)));
此外,代码中不需要Console.WriteLine()。