RSS源中的DateTime解析的异常使用c#中的SyndicationFeed

时间:2012-01-17 07:31:07

标签: c# rss

我正在尝试使用SyndicationFeed对象解析Rss2,Atom提要。但是我在解析DateTime字段时会收到XmlExceptions,比如pubDate

2012-01-17 08:01:06

public static List<SyndicationItem> getRssData(string url)
{
    List<SyndicationItem> list = new List<SyndicationItem>();

    WebClient client = new WebClient();
    try
    {
        SyndicationFeed feed = SyndicationFeed.Load(XmlReader.Create(url));
        list = (from item in feed.Items select item).ToList();
    }
    catch (Exception e)
    {
        throw e;
    }

    return list;
}

网址链接http://news.163.com/special/00011K6L/rss_newstop.xml

<item id="2">
    <title>...</title>
    <link>...</link>
    <description>......</description>
    <pubDate>2012-01-17 12:09:29</pubDate><-----Exception
</item>

有没有更好的方法来实现这一目标?请帮忙。谢谢。

2 个答案:

答案 0 :(得分:15)

有一种解决方法RSS20FeedFormatter throws exception trying to read some DateTime formats

  

要解决此问题,请创建一个识别不同日期格式的自定义XML阅读器。以下是自定义XML阅读器的示例:   

XmlReader r = new MyXmlReader(url);
SyndicationFeed feed = SyndicationFeed.Load(r);
Rss20FeedFormatter rssFormatter = feed.GetRss20Formatter();
XmlTextWriter rssWriter = new XmlTextWriter("rss.xml", Encoding.UTF8);
rssWriter.Formatting = Formatting.Indented;
rssFormatter.WriteTo(rssWriter);
rssWriter.Close();

..和以前代码中使用的类:

class MyXmlReader : XmlTextReader
{
    private bool readingDate = false;
    const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009

    public MyXmlReader(Stream s) : base(s) { }

    public MyXmlReader(string inputUri) : base(inputUri) { }

    public override void ReadStartElement()
    {
        if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) &&
            (string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) ||
            string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase)))
        {
            readingDate = true;
        }
        base.ReadStartElement();
    }

    public override void ReadEndElement()
    {
        if (readingDate)
        {
            readingDate = false;
        }
        base.ReadEndElement();
    }

    public override string ReadString()
    {
        if (readingDate)
        {
            string dateString = base.ReadString();
            DateTime dt;
            if(!DateTime.TryParse(dateString,out dt))
                dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture);
            return dt.ToUniversalTime().ToString("R", CultureInfo.InvariantCulture);
        }
        else
        {
            return base.ReadString();
        }
    }
}

答案 1 :(得分:3)

基本上,RSS提要无效。如果您查看RSS 2.0 specification,则说明:

  

RSS中的所有日期时间均符合RFC 822的日期和时间规范,但年份可以用两个字符或四个字符(四个首选)表示。

字符串“2012-01-17 12:09:29”不符合"Date and Time" part of RFC 822。它应该是“17 01 2012 12:09:29”或类似的东西。