" TypeError:不可用类型"尝试使用BeautifulSoup检索信息时,Python

时间:2016-06-25 14:02:24

标签: python beautifulsoup typeerror pytumblr

我试图使用TumblrAPI,PyTumblr来具体,在某些标签的帖子中抓取一些图片,

我使用的代码非常简单:

import pytumblr
from bs4 import BeautifulSoup

# Authenticate via API Key
client = pytumblr.TumblrRestClient('#Here is my API Key#')
print client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0)

所以结果是这样的:



{
  "meta": {
    "status": 200,
    "msg": "OK"
  },
  "response": {
    "blog": {
      "title": "W é r G i d A",
      "name": "wergida",
      "total_posts": 1181,
      "posts": 1181,
      "url": "http://wergida.tumblr.com/",
      "updated": 1466319493,
      "description": "Ha bárkit érdekelne",
      "is_nsfw": false,
      "ask": false,
      "ask_page_title": "Ask me anything",
      "ask_anon": false,
      "share_likes": true,
      "likes": 1131
    },
    "posts": [
      {
        "blog_name": "wergida",
        "id": 136740690571,
        "post_url": "http://wergida.tumblr.com/post/136740690571/bernhard-bernd-becher-1931-2007-and-hilla",
        "slug": "bernhard-bernd-becher-1931-2007-and-hilla",
        "type": "photo",
        "date": "2016-01-06 11:30:23 GMT",
        "timestamp": 1452079823,
        "state": "published",
        "format": "html",
        "reblog_key": "TiOl8nWT",
        "tags": [
          "industrial facades",
          "bernd and hilla becher",
          "photography",
          "eisenhüttenstadt",
          "brandenburg"
        ],
        "short_url": "https://tmblr.co/ZaE70t1-MOLgB",
        "summary": "Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT...",
        "recommended_source": null,
        "recommended_color": null,
        "highlighted": [],
        "note_count": 2,
        "caption": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br/></p>",
        "reblog": {
          "tree_html": "",
          "comment": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br></p>"
        },
        "trail": [
          {
            "blog": {
              "name": "wergida",
              "active": true,
              "theme": {
                "avatar_shape": "square",
                "background_color": "#FAFAFA",
                "body_font": "Helvetica Neue",
                "header_bounds": "",
                "header_image": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
                "header_image_focused": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
                "header_image_scaled": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
                "header_stretch": true,
                "link_color": "#529ECC",
                "show_avatar": true,
                "show_description": true,
                "show_header_image": true,
                "show_title": true,
                "title_color": "#444444",
                "title_font": "Gibson",
                "title_font_weight": "bold"
              },
              "share_likes": true,
              "share_following": false
            },
            "post": {
              "id": "136740690571"
            },
            "content_raw": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br></p>",
            "content": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br /></p>",
            "is_current_item": true,
            "is_root_item": true
          }
        ],
        "image_permalink": "http://wergida.tumblr.com/image/136740690571",
        "photos": [
          {
            "caption": "",
            "alt_sizes": [
              {
                "url": "https://67.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_1280.jpg",
                "width": 1280,
                "height": 973
              },
              {
                "url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_500.jpg",
                "width": 500,
                "height": 380
              },
              {
                "url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_400.jpg",
                "width": 400,
                "height": 304
              },
              {
                "url": "https://65.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_250.jpg",
                "width": 250,
                "height": 190
              },
              {
                "url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_100.jpg",
                "width": 100,
                "height": 76
              },
              {
                "url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_75sq.jpg",
                "width": 75,
                "height": 75
              }
            ],
            "original_size": {
              "url": "https://67.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_1280.jpg",
              "width": 1280,
              "height": 973
            }
          }
        ]
      }
    ],
    "total_posts": 223
  }
}
&#13;
&#13;
&#13;

但是当我使用BeautifulSoup来解析我得到的信息时:

soup = BeautifulSoup(client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0),"lxml")

我明白了:

Traceback (most recent call last):
File "tumblr_test.py", line 29, in <module>
soup = BeautifulSoup(client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0),"lxml")
File "/Users/CB/Public/scrapy/env/lib/python2.7/site-packages/bs4/__init__.py", line 199, in __init__
if markup[:5] == "http:" or markup[:6] == "https:":
TypeError: unhashable type

我尝试过不同的解析器,例如&#34; html.parser&#34; &#34; html5lib&#34;,仍然会得到相同的错误。

感谢您提供任何线索!

1 个答案:

答案 0 :(得分:2)

client.post()调用返回 Python字典,而不是包含HTML的字符串;它已经为你解析了JSON响应。因为BeautifulSoup试图将其视为字符串,所以您会收到错误,因为:5作为切片对象传递给字典,而且这不是可以删除的:

>>> {}[:5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type

字典不是HTML。没有必要尝试使用BeautifulSoup解析它。只需访问嵌套结构中的各个数据元素;如果这样的元素本身就是一个字符串,并且该字符串包含HTML标记,则然后可能有意义地解析该特定数据:

response = client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0)
post = response['response']['posts'][0]
soup = BeautifulSoup(post['caption'])