“ BeautifulSoup” AttributeError:'NoneType'对象没有属性'text'”

时间:2019-03-26 07:33:13

标签: python html web-scraping beautifulsoup

我当时正在用bs4抓取weather-searched Google,而Python找不到<span>标签。我该如何解决这个问题?

我试图用<span> class找到这个id,但是都失败了。

<div id="wob_dcp">
    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    
</div>

上面是我试图在page中抓取的HTML代码:

对不起,由于声誉我无法发布图片^^;

response = requests.get('https://www.google.com/search?hl=ja&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8+%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E+%EB%82%B4%EC%9D%BC+%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text

但是此代码失败,错误是:

Traceback (most recent call last):
  File "C:\Users\sungn_000\Desktop\weather.py", line 23, in <module>
    tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
AttributeError: 'NoneType' object has no attribute 'text'

请解决此错误。

3 个答案:

答案 0 :(得分:4)

这是因为天气部分是由浏览器通过JavaScript呈现的。因此,当您使用requests时,只会得到页面上没有所需内容的HTML内容。 如果要使用Web浏览器呈现的元素来解析页面,则应使用selenium(或requests-html)。

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://www.google.com/search?hl=en&ei=coGHXPWEIouUr7wPo9ixoAg&q=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&oq=%EC%9D%BC%EB%B3%B8%20%E6%A1%9C%E5%B7%9D%E5%B8%82%E7%9C%9F%E5%A3%81%E7%94%BA%E5%8F%A4%E5%9F%8E%20%EB%82%B4%EC%9D%BC%20%EB%82%A0%EC%94%A8&gs_l=psy-ab.3...232674.234409..234575...0.0..0.251.929.0j6j1......0....1..gws-wiz.......35i39.yu0YE6lnCms')
soup = BeautifulSoup(response.content, 'html.parser')

tomorrow_weather = soup.find('span', {'id': 'wob_dc'}).text
print(tomorrow_weather)

输出:

pawel@pawel-XPS-15-9570:~$ python test.py
Clear with periodic clouds

答案 1 :(得分:0)

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(a)
>>> a
'<div id="wob_dcp">\n    <span class="vk_gy vk_sh" id="wob_dc">Clear with periodic clouds</span>    \n</div>'
>>> soup.find("span", id="wob_dc").text
'Clear with periodic clouds'

尝试一下。

答案 2 :(得分:-1)

我也遇到了这个问题。 你不应该这样导入

from bs4 import BeautifulSoup

你应该像这样导入

from bs4 import * 

这应该有效。