我正在尝试获取google搜索的搜索结果数量,如果我只是从浏览器中保存的话,它就会出现在html中:
<div id="resultStats">About 8,660,000,000 results<nobr> (0.49 seconds) </nobr></div>
但是,当我在浏览器中打开通过python检索的HTML时,它看起来像是一个移动网站,并且其中不包含“ resultStats”。
我已经尝试(1)向https://www.google.com/search?client=firefox-b-d&q=test
之类的URL添加参数,以及(2)从浏览器复制完整的URL,但这无济于事。
import requests
from bs4 import BeautifulSoup
import re
def google_results(query):
url = 'https://www.google.com/search?q=' + query
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', id='resultStats')
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
print(google_results('test'))
错误:
Traceback: line 11, in google_results
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
AttributeError: 'NoneType' object has no attribute 'text'
答案 0 :(得分:0)
解决方案是添加标题(谢谢约翰):
import requests
from bs4 import BeautifulSoup
import re
def google_results(query):
url = 'https://www.google.com/search?q=' + query
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
}
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', id='resultStats')
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
print(google_results('test'))
输出:
9280000000