Question

看起来http://portland.beerandblog.com/feed/atom/搞砸了（0.92和2.0 RSS源也是如此）。

Universal Feed Parser（来自http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py?spec=svn295&r=295的最新版本）看不到任何日期。

    <title>Beer and Blog Portland</title>
    <atom:link href="http://portland.beerandblog.com/feed/" rel="self" type="application/rss+xml" />
    <link>http://portland.beerandblog.com</link>
    <description>Bloggers helping bloggers over beers in Portland, Oregon</description>
    <pubDate>Fri, 19 Jun 2009 22:54:57 +0000</pubDate>
    <generator>http://wordpress.org/?v=2.7.1</generator>
    <language>en</language>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
                    <item>
            <title>Widmer is sponsoring our beer for the After Party!!</title>
            <link>http://portland.beerandblog.com/2009/06/19/widmer-is-sponsoring-our-beer-for-the-after-party/</link>
            <comments>http://portland.beerandblog.com/2009/06/19/widmer-is-sponsoring-our-beer-for-the-after-party/#comments</comments>
            <pubDate>Fri, 19 Jun 2009 22:30:35 +0000</pubDate>
            <dc:creator>Justin Kistner</dc:creator>

            <category><![CDATA[beer]]></category>

我正在尝试

        try:
            published = e.published_parsed
        except:
            try:
                published = e.updated_parsed
            except:
                published = e.created_parsed

它失败了，因为我无法约会。

有关如何以合理的方式提取日期的任何想法？

谢谢！

Answer 1

适合我：

>>> e = feedparser.parse('http://portland.beerandblog.com/feed/atom/')
>>> e.feed.date
u'2009-06-19T22:54:57Z'
>>> e.feed.date_parsed
(2009, 6, 19, 22, 54, 57, 4, 170, 0)
>>> e.feed.updated_parsed
(2009, 6, 19, 22, 54, 57, 4, 170, 0)

也许您正在寻找e.updated_parsed，而应该寻找e.feed.updated_parsed？

Answer 2

使用裸except可能会掩盖代码中的问题。假设（我不使用feed解析器）AttributeError是你应该检查的特定异常，试试（偶然双关语）：

try:
    published = e.published_parsed
except AttributeError:
    try:
        published = e.updated_parsed
    except AttributeError:
        published = e.created_parsed

在任何情况下，请显示错误消息和追溯，而不是“它失败”。

修改我已下载最新版本（即不是来自svn）并按照此结果跟随文档中的示例：

C:\feedparser>\python26\python Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import feedparser >>> d = feedparser.parse('http://portland.beerandblog.com/feed/atom/') >>> d.entries[0].updated u'2009-06-19T22:54:57Z' >>> d.entries[0].updated_parsed time.struct_time(tm_year=2009, tm_mon=6, tm_mday=19, tm_hour=22, tm_min=54, tm_sec=57, tm_wday=4, tm_yday=170, tm_isdst=0) >>> d.entries[0].title u'Widmer is sponsoring our beer for the After Party!!' >>> d.entries[0].published u'2009-06-19T22:30:35Z' >>> d.entries[0].published_parsed time.struct_time(tm_year=2009, tm_mon=6, tm_mday=19, tm_hour=22, tm_min=30, tm_sec=35, tm_wday=4, tm_yday=170, tm_isdst=0) >>>

就像我说的那样，我不会使用RSS和Atoms等，但对我来说这似乎很简单。除了我不明白你从哪里获得<pubDate>标签和arpanet风格的时间戳;原始来源中不存在的AFAICT - 它具有<published>和ISO时间戳：

>>> import urllib >>> guff = urllib.urlopen('http://portland.beerandblog.com/feed/atom/').read() >>> guff.find('pubDate') -1 >>> guff.find('published') 1171 >>> guff[1160:1200] 'pdated>\n\t\t<published>2009-06-19T22:30:35' >>>

“e.published_parsed”中你的“e”是什么？考虑显示访问feedparser的完整故事，就像我上面所做的那样。

使用Universal Feed Parser获取日期时出现问题

2 个答案: