所以,我正在使用pytumblr API从Tumblr检索博客文章。我想检索帖子并仅提取帖子内容。 技术上 Tumblr通过dict
将它发送给我,但格式非常非常混乱。除此之外,它还使用单引号和双引号!这是我的代码:
post = client.posts(blogName, type = 'text', tag = 'suggestion', limit = 1)
postformat = str(post[u'posts']).replace("[", "").replace("]", "")
blog = dict(ast.literal_eval(postformat))
print(post[u'body']).replace("<p>", "").replace("</p>", "")
首先,Tumblr给了我一个只有3个键的非常大的字典,但每个字典都有一个字典! (???)。所以,我需要拿出我正在寻找的第一个键,然后使用ast将该键的内容转换回dict。当我尝试这个时,我得到了一个不同的错误。因此,我删除了post
周围的括号,并使用了ast来尝试解释内容。但是当我这样做的时候,就会在第3行抛出一个SyntaxError。这里的post[u'posts']
是原始的,非常令人困惑的格式。但是由于stackoverflow将它放在一行上,我还把它放在了http://pastebin.com/9fnuS2F6
{u'body': u'<p>Concept: we’re in a cute little cottage, surrounded by flowers. I’m making breakfast and you’re on your way home from walking the dogs. We keep our own bees and collect honey from them; they are as happy and safe as we are.</p>', u'liked': False, u'followed': False, u'reblog_key': u'tlPNsk6e', u'reblog': {u'comment': u'<p>Concept: we\u2019re in a cute little cottage, surrounded by flowers. I\u2019m making breakfast and you\u2019re on your way home from walking the dogs. We keep our own bees and collect honey from them; they are as happy and safe as we are.</p>', u'tree_html': u''}, u'can_send_in_message': True, u'id': 146647556007L, u'post_url': u'http://affectionsuggestion.tumblr.com/post/146647556007/concept-were-in-a-cute-little-cottage', u'can_reply': True, u'title': None, u'tags': u'queued', u'suggestion', u'suggestion blog', u'concept', u'bees', u'love', u'future', u'happy', u'happiness', u'couple', u'relationship', u'nature', u'lesbians', u'cute', u'highlighted': , u'recommended_source': None, u'state': u'published', u'short_url': u'https://tmblr.co/ZzO-5i28au1_d', u'type': u'text', u'recommended_color': None, u'format': u'html', u'timestamp': 1467190899, u'note_count': 523, u'trail': {u'content': u'<p>Concept: we\u2019re in a cute little cottage, surrounded by flowers. I\u2019m making breakfast and you\u2019re on your way home from walking the dogs. We keep our own bees and collect honey from them; they are as happy and safe as we are.</p>', u'content_raw': u'<p>Concept: we\u2019re in a cute little cottage, surrounded by flowers. I\u2019m making breakfast and you\u2019re on your way home from walking the dogs. We keep our own bees and collect honey from them; they are as happy and safe as we are.</p>', u'is_current_item': True, u'blog': {u'active': True, u'theme': {u'title_font_weight': u'bold', u'title_color': u'#444444', u'header_bounds': u'', u'title_font': u'Gibson', u'link_color': u'#3D7291', u'header_image_focused': u'https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1', u'show_description': True, u'show_header_image': False, u'header_stretch': True, u'body_font': u'Helvetica Neue', u'show_title': True, u'header_image_scaled': u'https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1', u'avatar_shape': u'square', u'show_avatar': False, u'background_color': u'#FEB0B0', u'header_image': u'https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1'}, u'share_following': False, u'name': u'affectionsuggestion', u'share_likes': False}, u'is_root_item': True, u'post': {u'id': u'146647556007'}}, u'date': u'2016-06-29 09:01:39 GMT', u'slug': u'concept-were-in-a-cute-little-cottage', u'blog_name': u'affectionsuggestion', u'summary': u"Concept: we're in a cute little cottage, surrounded by flowers. I'm making breakfast and you're on your way home from walking..."}
答案 0 :(得分:1)
语法错误是由删除括号引起的。在字典里面的某处有一个标签列表,你输入的字符串是:
fetchedTimestamps = c("30-1-2016 10:00:00", "30-1-2016 10:15:00", "30-1-2016 10:45:00")
fetchedTimestamps = strptime(fetchedTimestamps, format = "%d-%m-%Y %X")
# [1] "2016-01-30 10:00:00 CET" "2016-01-30 10:15:00 CET" "2016-01-30 10:45:00 CET"
expectedTimestamps = seq(min(fetchedTimestamps), max(fetchedTimestamps), by = "15 mins")
# [1] "2016-01-30 10:00:00 CET" "2016-01-30 10:15:00 CET" "2016-01-30 10:30:00 CET" "2016-01-30 10:45:00 CET"
expectedTimestamps %in% fetchedTimestamps
# [1] FALSE FALSE FALSE FALSE
as.numeric(expectedTimestamps) %in% as.numeric(fetchedTimestamps)
# [1] TRUE TRUE FALSE TRUE
也就是说,它从字典表示法切换到列表表示法,因为标签周围的... u'title': None, u'tags': u'queued', u'suggestion', u'suggestion blog', ...
消失了。
我怀疑[]
只是一个帖子列表,post['posts']
是第一个,post['posts'][0]
是您正在寻找的主体。