AttributeError:“ beautifulsoop”网络抓取中,“ NoneType”对象没有属性“ get_text”

时间:2020-11-04 14:32:50

标签: python web-scraping beautifulsoup python-requests

我正在使用python中的beautifulsoop(网络抓取)进行项目。早期,该程序运行良好且完美。但是,现在它给出了错误,如下所示。网站的html结构可能会更改。但是我仍然无法找出错误并解决。请帮忙!!!

该网站为-[https://covidindia.org/] [1]

请帮助我解决错误。

错误-

 Traceback (most recent call last):
  File "t1.py", line 112, in <module>
    mainLabel = tk.Label(root, text=get_corona_detail_of_india(), font=f, bg='light blue',fg='red')
  File "t1.py", line 23, in get_corona_detail_of_india
    total_cases = soup.find("div",class_="elementor-element elementor-element-aceece0 elementor-widget elementor-widget-heading",).get_text()
AttributeError: 'NoneType' object has no attribute 'get_text

我的代码-

URL = 'https://covidindia.org/'
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, 'html.parser')
    #print(soup)
    total_cases = soup.find("div",class_="elementor-element elementor-element-aceece0 elementor-widget elementor-widget-heading",).get_text()
    tc=(total_cases.strip())

另外,当我提取汤时,o / p是-

<html><head><title>403 Forbidden</title></head>
<body>
<center><h1>403 Forbidden</h1></center>
<hr/><center>nginx</center>

我的访问是否被永久禁止?

2 个答案:

答案 0 :(得分:0)

在您的请求中添加user-agent标头。如果您不添加user-agent,则网站会将您检测为漫游器,因此您将无法访问该网站的内容。这是完整的代码:

from bs4 import BeautifulSoup
import requests

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}

URL = 'https://covidindia.org/'

page = requests.get(URL,headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

#print(soup)

total_cases = soup.find("div",class_="elementor-element elementor-element-aceece0 elementor-widget elementor-widget-heading",).get_text()

tc=(total_cases.strip())

输出:

>>> tc
'Total Cases - 83,14,673 (+46,171)'

答案 1 :(得分:0)

当网站需要您未在请求中放入的对象,请检查该网站需要什么,可能是其他用户回答的用户代理或其他内容时,会发生此问题。