我正在使用Python创建一个小型网络抓取程序,它从 newegg.com 获取GPU信息并记下所有价格。
截至目前,我没有实现电子表格,因为每次运行它时,我都会遇到2个错误之一。
代码如下:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import numpy as np
myURL = "https://www.newegg.com/global/uk/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=96&order=BESTMATCH" # defining my url as a variable
uClient = uReq(myURL) #opening the connection
page_html = uClient.read() # getting html
uClient.close() # closing the client
page_soup = soup(page_html, "html.parser") # html parsing
containers = page_soup.findAll("div", {"class":"item-container"}) #get all
item containers/product
container = containers[0]
count = 0
for container in containers:
print(count)
brand = container.div.div.a.img["title"]# get the brand of the card
if brand == None:
print("N/A")
else:
print(brand)
title_container = container.findAll("a", {"class", "item-title"})
product_name = title_container[0].text # getting the product name
if product_name == None:
print("N/A")
else:
print(product_name)
price1 = container.find("div",{"class":"item-action"})
price1 = price1.ul
price2 = price1.find("li", {"class": "price-current"}).contents #defining the product price
if not price2:
print("N/A")
else:
print(price2[2])
print(price2[3].text)
print(price2[4].text)
print()
count+=1
错误说明如下:
追踪(最近一次通话): 文件" C:/ Users / Ethan Price / Desktop / test.py",第23行,in brand = container.div.div.a.img [" title"]#获取卡的品牌 TypeError:' NoneType'对象不可订阅
追踪(最近一次通话): 文件" C:/ Users / Ethan Price / Desktop / test.py",第43行,in 打印(price2 [2]) IndexError:列表索引超出范围
在尝试修复它时,我尝试将列表转换为数组并尝试更改if语句。
答案 0 :(得分:2)
这两个错误消息都意味着您希望看到的某些元素不存在。第一个是抱怨container.div.div.a.img
在您尝试下标时None
(并且None
s无法下标,原因很明显)。另一个是抱怨列表price2
没有您想象的那么长,因此price2[2]
超出了范围。
答案 1 :(得分:0)
第一个错误,检查图像及其标题标签是否存在
brand = None
# might want to check there is even an anchor tag
_img = container.div.div.a.img
if _img:
brand = _img["title"]
其次,检查价格清单的长度
If 2 <= len(price2) <= 5:
for p in price2[2:]
print(p)