我正在尝试从此页面获取我正在处理的项目的产品:lazada, page ispection 使用:
from bs4 import BeautifulSoup
import urllib
import re
r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2").read()
soup = BeautifulSoup(r,"lxml")
letters = soup.findAll("span",class_=re.compile("product-card__name"))
print type(letters)
print letters[0]
当我这样做时,我收到以下错误:
Traceback (most recent call last): File "C:/Python27/project/testaja.py", line 9, in print letters[0] IndexError: list index out of range
对此有何想法?
答案 0 :(得分:0)
我认为您的页面可能过多,在浏览器中导航并查看网页返回的内容。
此外,您可以修改代码,以便检查页面响应标头,以确保在尝试抓取页面之前正确返回页面。我修改了您的代码以显示以下示例:
from bs4 import BeautifulSoup
import urllib
import re
r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2")
header_code = r.getcode()
if header_code == 200:
html = r.read()
soup = BeautifulSoup(html, "lxml")
letters = soup.findAll("span", {"class" : re.compile("product-card__name")})
for letter in letters:
print letter
else:
print("oops, something went wonky. Page response was: %s"% header_code)