Question

我试图解析：id = html中的资格。

我遵循了beautifulsoup文档，并请求文档。

我的代码：

import requests
from bs4 import BeautifulSoup

def get_content(url):
    if type(url) != str:
        print('You need to included a string')
        exit()
    else:
        req  = requests.get(url)
        soup = BeautifulSoup(req, 'html.parser')
        qualifications = soup.find(id="qualifications")
        print('Qualifications:\n{}'.format(qualifications))

当我像以下那样运行时：

get_content('http://www.somesite.com')

它会抛出错误：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "parse.py", line 10, in get_content
    soup = BeautifulSoup(req, 'html.parser')
  File "python3.5/site-packages/bs4/__init__.py", line 176, in __init__
    elif len(markup) <= 256:
TypeError: object of type 'Response' has no len()

我该如何使这项工作？看来错误可能是结果请求的大小大于256？

Answer 1

您正在传递响应对象，而不是实际内容。您需要传递req.content代替：

soup = BeautifulSoup(req.content, 'html.parser')

您可能希望传递服务器提供的任何编码信息：

encoding = req.encoding if 'charset' in req.headers.get('content-type', '').lower() else None
soup = BeautifulSoup(req.content, 'html.parser', from_encoding=encoding)

Answer 2

import requests
from bs4 import BeautifulSoup

url = 'Your url'

def get_html(url):
    r = requests.get('https://m.vk.com/uporols_you').text
    soup = BeautifulSoup(r, 'lxml')

请求/ BeautifulSoup抛出错误解析给定的id

2 个答案: