带有TypeError的基本Python Scraper:“ Response”类型的对象没有len()

时间:2019-01-09 07:07:42

标签: python beautifulsoup

我是第一次使用Python编写网络爬虫。我已经完成了一些教程,现在正在尝试我的第一个教程。这是一个非常简单的测试,它产生了我在主题行中指出的错误。

import requests
from bs4 import BeautifulSoup
url = "https://www.autotrader.ca/cars/mercedes-benz/ab/calgary/?rcp=15&rcs=0&srt=3&prx=100&prv=Alberta&loc=T3P%200H2&hprc=True&wcp=True&sts=Used&adtype=Private&showcpo=1&inMarket=advancedSearch"
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98  Safari/537.36'
html = requests.get(url,headers={'User-Agent': user_agent})
soup = BeautifulSoup(html, "lxml")
print(soup)

请帮助我试用此代码。任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

使用html.text代替html。在get()方法中发送与用户代理绑定的标头是一个好习惯。

import requests
from bs4 import BeautifulSoup

url = "https://www.autotrader.ca/cars/mercedes-benz/ab/calgary/?rcp=15&rcs=0&srt=3&prx=100&prv=Alberta&loc=T3P%200H2&hprc=True&wcp=True&sts=Used&adtype=Private&showcpo=1&inMarket=advancedSearch"

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text,"lxml")
return soup

答案 1 :(得分:0)

在此行进行更改:

soup = BeautifulSoup(html, "lxml")

soup = BeautifulSoup(html.content, "lxml")

soup = BeautifulSoup(html.text, "lxml")

这将返回网页的HTML结构。