我正在尝试使用SyndicationFeed对象解析Rss2,Atom提要。但是我在解析DateTime字段时会收到XmlExceptions,比如pubDate
2012-01-17 08:01:06
public static List<SyndicationItem> getRssData(string url)
{
List<SyndicationItem> list = new List<SyndicationItem>();
WebClient client = new WebClient();
try
{
SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
list = (from item in feed.Items select item).ToList();
}
catch (Exception e)
{
throw e;
}
return list;
}
网址链接http://news.163.com/special/00011K6L/rss_newstop.xml
<item id="2">
<title>...</title>
<link>...</link>
<description>......</description>
<pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>
有没有更好的方法来实现这一目标?请帮忙。谢谢。
答案 0 :(得分:15)
有一种解决方法RSS20FeedFormatter throws exception trying to read some DateTime formats。
要解决此问题,请创建一个识别不同日期格式的自定义XML阅读器。以下是自定义XML阅读器的示例:
XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();
..和以前代码中使用的类:
class MyXmlReader : XmlTextReader
{
private bool readingDate = false;
const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009
public MyXmlReader(Stream s) : base(s) { }
public MyXmlReader(string inputUri) : base(inputUri) { }
public override void ReadStartElement()
{
if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
(string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
{
readingDate = true;
}
base.ReadStartElement();
}
public override void ReadEndElement()
{
if (readingDate)
{
readingDate = false;
}
base.ReadEndElement();
}
public override string ReadString()
{
if (readingDate)
{
string dateString = base.ReadString();
DateTime dt;
if(!DateTime.TryParse(dateString,out dt))
dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
}
else
{
return base.ReadString();
}
}
}
答案 1 :(得分:3)
基本上,RSS提要无效。如果您查看RSS 2.0 specification,则说明:
RSS中的所有日期时间均符合RFC 822的日期和时间规范,但年份可以用两个字符或四个字符(四个首选)表示。
字符串“2012-01-17 12:09:29”不符合"Date and Time" part of RFC 822。它应该是“17 01 2012 12:09:29”或类似的东西。