Python BeautifulSoup无法读取div标签

时间:2016-04-06 16:31:22

标签: python beautifulsoup

我正在尝试从此页面获取我正在处理的项目的产品:lazadapage ispection 使用:

from bs4 import BeautifulSoup
import urllib
import re
r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2").read()
soup = BeautifulSoup(r,"lxml")
letters = soup.findAll("span",class_=re.compile("product-card__name"))
print type(letters) 
print letters[0]

当我这样做时,我收到以下错误:

Traceback (most recent call last):
  File "C:/Python27/project/testaja.py", line 9, in 
    print letters[0]
IndexError: list index out of range

对此有何想法?

1 个答案:

答案 0 :(得分:0)

我认为您的页面可能过多,在浏览器中导航并查看网页返回的内容。

此外,您可以修改代码,以便检查页面响应标头,以确保在尝试抓取页面之前正确返回页面。我修改了您的代码以显示以下示例:

from bs4 import BeautifulSoup
import urllib
import re

r = urllib.urlopen("http://www.lazada.co.id/catalog/?q=note+2")
header_code = r.getcode()

if header_code == 200:
    html = r.read()
    soup = BeautifulSoup(html, "lxml")
    letters = soup.findAll("span", {"class" : re.compile("product-card__name")})

    for letter in letters:
        print letter
else:
    print("oops, something went wonky. Page response was: %s"% header_code)