这是我的第一个webscraping应用程序类型。
这是我的代码:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url= 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
#opening up connection, grabbing page
uClient = uReq(my_url)
#makes it a variablepage_html = uClient.read()
page_html = uClient.read()
#will close it
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grabs each container in HTML
containers = page_soup.find("div",{"class":"item-container"})
filename = "Products.csv"
f = open(filename, "w")
headers = "brand, product_name, shipping\n"
f.write(headers)
for container in containers:
brand = containers.div.div.a["title"]
title_container = containers.find("a", {"class": "item-title"})
product_name = title_container[0].txt
shipping_container = container.find("li", {"class": "price-ship"})
shipping = shipping_container[0].txt.strip()
print("brand: " + brand)
print("product_name: " + product_name)
print("shipping: " + shipping)
f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "\n")
f.close()
这是错误:
Traceback (most recent call last):
File "<ipython-input-23-b9aa37e3923c>", line 1, in <module>
runfile('/Users/Mohit/Documents/Python/webscrape.py', wdir='/Users/Mohit/Documents/Python')
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/Mohit/Documents/Python/webscrape.py", line 38, in <module>
brand = containers.div.div.a["title"]
TypeError: 'NoneType' object is not subscriptable
基本上,我想要它的目的是获取页面上所有图形卡的品牌,产品名称和运费,并将它们格式化为CSV格式。
我认为程序无法找到图像或数据应从何处导入。这是我的第一个webscraping项目,我使用https://www.youtube.com/watch?v=XQgXKtPSzUI&t=800s
作为教程
答案 0 :(得分:0)
您似乎正在访问某些变量的属性而不检查它们是否存在。例如,在这一行:(给出了您正在经历的例外情况;但也代表了代码中的其他行......)
brand = containers.div.div.a["title"]
我建议采取更加谨慎的方法。例如,这个天真的代码:
if (containers is not None) and (containers.div is not None) and (containers.div.div is not None) and (containers.div.div.a is not None):
brand = containers.div.div.a["title"]
else:
brand = ""
如果您想进一步调试特定HTML中的问题,请尝试嵌套条件:
if containers is not None:
if containers.div is not None:
# ... more conditions here ...
else:
print "ERROR 2: containers.div was None! :("
else:
print "ERROR 1: containers was None! :("