当我尝试使用Spyder运行此代码时,没有任何反应。 我没有收到任何错误,只是没有打印输出:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.newegg.com/Power-Banks/SubCategory/ID-3724?cm_sp=Cat_Batteries-Power-Banks-Chargers_1-_-VisNav-_-Power-Banks'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
# grab products
containers = page_soup.findAll("div",{"class":"item-container"})
for container in containers:
brand = container.div.div.a.img["title"]
# get the product name
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text # search the text in the first index of the list of <a></a>
# find shipping prices
shipping_container = container.findAll("li", {"class":"price-ship"})
shipping = shipping_container[0].text.strip()
print("brand:" + brand)
print("product_name:" + product_name)
print("shipping:" + shipping)
这可能是什么问题?
答案 0 :(得分:0)
测试时,containers
确实包含了一组有效的结果。虽然有其他问题。并非所有容器都有合适的.div.div.a.img["title"]
元素:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.newegg.com/Power-Banks/SubCategory/ID-3724?cm_sp=Cat_Batteries-Power-Banks-Chargers_1-_-VisNav-_-Power-Banks'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
# grab products
containers = page_soup.findAll("div", {"class":"item-container"})
if len(containers) == 0:
print(page_html) # diagnose reason for no containers
else:
for container in containers:
try:
brand = container.div.div.a.img["title"]
except:
pass
else:
# get the product name
title_container = container.findAll("a", {"class":"item-title"})
product_name = title_container[0].text # search the text in the first index of the list of <a></a>
# find shipping prices
shipping_container = container.findAll("li", {"class":"price-ship"})
shipping = shipping_container[0].text.strip()
print("brand:" + brand)
print("product_name:" + product_name)
print("shipping:" + shipping)
这可以通过使用异常处理来解决。这给出了以下类型的结果:
brand:Orico
product_name:[Qualcomm Certified Quick Charge 3.0] ORICO TS1-BK 10000 mAh QC3.0 & USB-C / Type-C Port Portable Charger External Battery Pack Power Bank for Phones, Tablet and More
shipping:Free Shipping
brand:Duracell Powermat
product_name:Duracell Powermat White 2X Charging Mat M2PW1
shipping:Free Shipping
brand:ADATA
product_name:ADATA D8000L 8000mAh w/ 200 Lumens LED (AD8000L-5V-CBK)
shipping:Free Shipping
brand:Mophie
product_name:mophie Juice Pack Powerstation Green
shipping:Free Shipping
brand:SAMSUNG
product_name:Fast Charge Battery Pack (10.2A), Black
shipping:$7.73 Shipping