我的Python代码在运行时会不断返回错误

时间:2018-02-10 18:35:27

标签: python web-scraping beautifulsoup urllib

我正在使用Python创建一个小型网络抓取程序,它从 newegg.com 获取GPU信息并记下所有价格。
截至目前,我没有实现电子表格,因为每次运行它时,我都会遇到2个错误之一。

代码如下:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

myURL = "https://www.newegg.com/global/uk/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=96&order=BESTMATCH" # defining my url as a variable

uClient = uReq(myURL) #opening the connection
page_html = uClient.read() # getting html
uClient.close() # closing the client

page_soup = soup(page_html, "html.parser") # html parsing

containers = page_soup.findAll("div", {"class":"item-container"}) #get all 
item containers/product

container = containers[0]

count = 0

for container in containers:

    print(count)

    brand = container.div.div.a.img["title"]# get the brand of the card
    if brand == None:
        print("N/A")
    else:
        print(brand)

    title_container = container.findAll("a", {"class", "item-title"})
    product_name = title_container[0].text # getting the product name
    if product_name == None:
        print("N/A")
    else:
        print(product_name)

    price1 = container.find("div",{"class":"item-action"})
    price1 = price1.ul
    price2 = price1.find("li", {"class": "price-current"}).contents #defining the product price
    if not price2:
        print("N/A")
    else:
        print(price2[2])
        print(price2[3].text)
        print(price2[4].text) 

    print()
    count+=1

错误说明如下:

  1.   

    追踪(最近一次通话):   文件" C:/ Users / Ethan Price / Desktop / test.py",第23行,in   brand = container.div.div.a.img [" title"]#获取卡的品牌   TypeError:' NoneType'对象不可订阅

  2.   

    追踪(最近一次通话):   文件" C:/ Users / Ethan Price / Desktop / test.py",第43行,in   打印(price2 [2])   IndexError:列表索引超出范围

  3. 在尝试修复它时,我尝试将列表转换为数组并尝试更改if语句。

2 个答案:

答案 0 :(得分:2)

这两个错误消息都意味着您希望看到的某些元素不存在。第一个是抱怨container.div.div.a.img在您尝试下标时None(并且None s无法下标,原因很明显)。另一个是抱怨列表price2没有您想象的那么长,因此price2[2]超出了范围。

答案 1 :(得分:0)

第一个错误,检查图像及其标题标签是否存在

brand = None
# might want to check there is even an anchor tag 
_img = container.div.div.a.img
if _img:
    brand = _img["title"]

其次,检查价格清单的长度

If 2 <= len(price2) <= 5:
    for p in price2[2:]
        print(p)