网页抓取错误是什么?

时间:2021-02-05 15:23:02

标签: python web-scraping beautifulsoup

我在最后一行得到一个超出范围的错误列表索引。此外,容器变量为空,并且在打印其长度为 12 时输出 0。容器变量应该包含所有详细信息,但它没有获取任何内容。

  from urllib.request import urlopen as uReq
  from bs4 import BeautifulSoup as soup

  my_url='https://www.newegg.com/global/in-en/p/pl?d=graphics+card'
  uClient=uReq(my_url)  #opening the connecting,grabbing the page,this line downloads the web page

  page_html = uClient.read()   #this line dump every thing in the variable page_html
  uClient.close()         #close the connections.
  page_soup = soup(page_html,"html.parser")           #html parsing
  #print(page_soup.h1)                       #this line print the header 

  #print(page_soup.p)

  containers = page_soup.findAll("div", {"class": {"item-container"}})   #grabbing each product
  len(containers)
  containers[0] 

1 个答案:

答案 0 :(得分:0)

尝试像这样使用 requests

import requests
from bs4 import BeautifulSoup as soup
url = 'https://www.newegg.com/p/pl?d=graphics+card'
r = requests.get(url)
soup_page = soup(r.content,'html.parser') 
containers = soup_page.find_all('div',{'class':'item-container'})
print(len(containers))