请求/ BeautifulSoup抛出错误解析给定的id

时间:2016-02-13 18:41:28

标签: python python-3.x beautifulsoup python-requests

我试图解析:id = html中的资格。

我遵循了beautifulsoup文档,并请求文档。

我的代码:

import requests
from bs4 import BeautifulSoup

def get_content(url):
    if type(url) != str:
        print('You need to included a string')
        exit()
    else:
        req  = requests.get(url)
        soup = BeautifulSoup(req, 'html.parser')
        qualifications = soup.find(id="qualifications")
        print('Qualifications:\n{}'.format(qualifications))

当我像以下那样运行时:

get_content('http://www.somesite.com')

它会抛出错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parse.py", line 10, in get_content
    soup = BeautifulSoup(req, 'html.parser')
  File "python3.5/site-packages/bs4/__init__.py", line 176, in __init__
    elif len(markup) <= 256:
TypeError: object of type 'Response' has no len()

我该如何使这项工作?看来错误可能是结果请求的大小大于256?

2 个答案:

答案 0 :(得分:2)

您正在传递响应对象,而不是实际内容。您需要传递req.content代替:

soup = BeautifulSoup(req.content, 'html.parser')

您可能希望传递服务器提供的任何编码信息:

encoding = req.encoding if 'charset' in req.headers.get('content-type', '').lower() else None
soup = BeautifulSoup(req.content, 'html.parser', from_encoding=encoding)

答案 1 :(得分:0)

import requests
from bs4 import BeautifulSoup

url = 'Your url'

def get_html(url):
    r = requests.get('https://m.vk.com/uporols_you').text
    soup = BeautifulSoup(r, 'lxml')