我试图解析:id = html中的资格。
我遵循了beautifulsoup文档,并请求文档。
我的代码:
import requests
from bs4 import BeautifulSoup
def get_content(url):
if type(url) != str:
print('You need to included a string')
exit()
else:
req = requests.get(url)
soup = BeautifulSoup(req, 'html.parser')
qualifications = soup.find(id="qualifications")
print('Qualifications:\n{}'.format(qualifications))
当我像以下那样运行时:
get_content('http://www.somesite.com')
它会抛出错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "parse.py", line 10, in get_content
soup = BeautifulSoup(req, 'html.parser')
File "python3.5/site-packages/bs4/__init__.py", line 176, in __init__
elif len(markup) <= 256:
TypeError: object of type 'Response' has no len()
我该如何使这项工作?看来错误可能是结果请求的大小大于256?
答案 0 :(得分:2)
您正在传递响应对象,而不是实际内容。您需要传递req.content
代替:
soup = BeautifulSoup(req.content, 'html.parser')
您可能希望传递服务器提供的任何编码信息:
encoding = req.encoding if 'charset' in req.headers.get('content-type', '').lower() else None
soup = BeautifulSoup(req.content, 'html.parser', from_encoding=encoding)
答案 1 :(得分:0)
import requests
from bs4 import BeautifulSoup
url = 'Your url'
def get_html(url):
r = requests.get('https://m.vk.com/uporols_you').text
soup = BeautifulSoup(r, 'lxml')