Webscrape应用程序找不到正确的HTML容器

时间:2018-03-01 10:39:15

标签: python beautifulsoup

这是我的第一个webscraping应用程序类型。

这是我的代码:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url= 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'

#opening up connection, grabbing page
uClient = uReq(my_url)
#makes it a variablepage_html = uClient.read()

page_html = uClient.read()
#will close it
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#grabs each container in HTML
containers = page_soup.find("div",{"class":"item-container"})

filename = "Products.csv"
f = open(filename, "w")

headers = "brand, product_name, shipping\n"

f.write(headers)

for container in containers:
    brand = containers.div.div.a["title"]

    title_container = containers.find("a", {"class": "item-title"})
    product_name = title_container[0].txt

    shipping_container = container.find("li", {"class": "price-ship"})
    shipping = shipping_container[0].txt.strip()

    print("brand: " + brand)
    print("product_name: " + product_name)
    print("shipping: " + shipping)

    f.write(brand + "," + product_name.replace(",", "|") + "," + shipping + "\n")
f.close()

这是错误:

Traceback (most recent call last):

  File "<ipython-input-23-b9aa37e3923c>", line 1, in <module>
    runfile('/Users/Mohit/Documents/Python/webscrape.py', wdir='/Users/Mohit/Documents/Python')

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/Mohit/Documents/Python/webscrape.py", line 38, in <module>
    brand = containers.div.div.a["title"]

TypeError: 'NoneType' object is not subscriptable

基本上,我想要它的目的是获取页面上所有图形卡的品牌,产品名称和运费,并将它们格式化为CSV格式。

我认为程序无法找到图像或数据应从何处导入。这是我的第一个webscraping项目,我使用https://www.youtube.com/watch?v=XQgXKtPSzUI&t=800s作为教程

1 个答案:

答案 0 :(得分:0)

您似乎正在访问某些变量的属性而不检查它们是否存在。例如,在这一行:(给出了您正在经历的例外情况;但也代表了代码中的其他行......)

brand = containers.div.div.a["title"]

我建议采取更加谨慎的方法。例如,这个天真的代码:

if (containers is not None) and (containers.div is not None) and (containers.div.div is not None) and (containers.div.div.a is not None):
  brand = containers.div.div.a["title"]
else:
  brand = ""

如果您想进一步调试特定HTML中的问题,请尝试嵌套条件:

if containers is not None:
  if containers.div is not None:
    # ... more conditions here ...
  else:
    print "ERROR 2: containers.div was None! :("
else:
  print "ERROR 1: containers was None! :("