BeautifulSoup4属性错误

时间:2017-07-04 23:36:06

标签: python

我已经在这几个小时了!我一直在收到错误,所有我想要做的就是刮掉产品,品牌,价格和运费的名称,我已经成功地刮掉了所有,唯一的问题是当我试图刮掉价格并得到它时遍历网页上的每个项目!我有一个单独的文件,我成功地刮了价!这是我的代码试图把所有东西放在一起,这是我得到的错误!请帮忙!

# coding=utf-8 
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=Property&N=100007709%2050001419%2050001315%2050001402%2050001312%2050001669%2050012150%2050001561%2050001314%2050001471%20600566292%20600566291%20600565504%20601201888%20601204369%20601210955%20601203793%204814%20601296707&IsNodeId=1&cm_sp=Cat_video-Cards_1-_-Visnav-_-Gaming-Video-Cards_1'

# This grabs the webpage and downloads it!
uClient = uReq(url)

# This is so i can read everything out of the url!
page_html = uClient.read() 
uClient.close()

page_soup = soup(page_html, "html.parser")

# Grabs each product!
containers = page_soup.findAll("div", {"class": {"item-container", "item-action"}}) 

# set up the loop to get the brand of the item!
for container in containers:

    brand_container = container.findAll("a", {"class":"item-brand"})
    brand = container.div.div.a.img["title"]

    title_container = container.findAll("a", {"class":"item-title"})
    product_name = title_container[0].text 

    price_container = container.findAll("li", {"class":"price-current"})
    price = container.strong.text

    shipping_container = page_soup.findAll("li", {"class": "price-ship"})
    shipping = shipping_container[0].text.strip()

    print("Product_name: " + product_name)
    print("Brand: " + brand)
    print("Price: " + price)
    print("Shipping: " + shipping)

1 个答案:

答案 0 :(得分:2)

引发AttributeError是因为标记没有您要查找的子标记(例如container没有.div)。原因在于这一行:

containers = page_soup.findAll("div", {"class": {"item-container", "item-action"}}) 

您正在containers所有item-container div和item-action div。 item-action div不是您要迭代的容器。如果您将该行更改为:

containers = page_soup.findAll("div", {"class": {"item-container"}})

然后它应该正确解析。

最后你应该改变

brand = container.div.div.a.img["title"]

为:

brand = brand_container.img["title"]