我正在尝试编写一个脚本,用于抓取网站以获取产品信息。
当前,该程序使用for循环抓取产品价格和唯一ID。
for循环包含两个if语句,以阻止其抓取NoneTypes。
import requests
from bs4 import BeautifulSoup
def average(price_list):
return sum(price_list) / len(price_list)
# Requests search data from Website
page_link = 'URL'
page_response = requests.get(page_link, timeout=5) # gets the webpage (search) from Website
page_content = BeautifulSoup(page_response.content, 'html.parser') # turns the webpage it just retrieved into a BeautifulSoup-object
# Selects the product listings from page content so we can work with these
product_listings = page_content.find_all("div", {"class": "unit flex align-items-stretch result-item"})
prices = [] # Creates a list to add the prices to
uids = [] # Creates a list to store the unique ids
for product in product_listings:
## UIDS
if product.find('a')['id'] is not None:
uid = product.find('a')['id']
uids.append(uid)
# PRICES
if product.find('p', class_ = 'result-price man milk word-break') is not None:# assures that the loop only finds the prices
price = int(product.p.text[:-2].replace(u'\xa0', '')) # makes a temporary variable where the last two chars of the string (,-) and whitespace are removed, turns into int
prices.append(price) # adds the price to the list
在if product.find('a')['id'] is not None:
上,我得到一个Exception has occurred: TypeError
'NoneType' object is not subscriptable
。
无论如何,如果我运行print(product.find('a')['id'])
,我会得到我想要的价值,这让我感到非常困惑。这不是说错误不是NoneType吗?
此外,if product.find('p', class_ = 'result-price man milk word-break') is not None:
的工作无懈可击。
我尝试将if product.find('p', class_ = 'result-price man milk word-break')
分配给变量,然后在for循环中运行它,但这没有用。
我也对Google搜寻做出了应有的贡献,但没有成功。可能存在的问题是,我对编程还比较陌生,也不知道确切要搜索什么,但是我仍然找到了很多似乎与相关问题有关的答案,但是这些问题对我来说不起作用代码。
任何帮助将不胜感激!
答案 0 :(得分:1)
只需执行一个中间步骤:
res = product.find('a')
if res is not None and res['id'] is not None:
uid = product.find('a')['id']
uids.append(uid)
这样,如果因为未找到该项目而find返回None
,则最终不会尝试对NoneType下标。