google search html不包含div id ='resultStats'

时间:2019-07-13 13:21:20

标签: python python-3.x python-requests

我正在尝试获取google搜索的搜索结果数量,如果我只是从浏览器中保存的话,它就会出现在html中:

<div id="resultStats">About 8,660,000,000 results<nobr> (0.49 seconds)&nbsp;</nobr></div>

但是,当我在浏览器中打开通过python检索的HTML时,它看起来像是一个移动网站,并且其中不包含“ resultStats”。

我已经尝试(1)向https://www.google.com/search?client=firefox-b-d&q=test之类的URL添加参数,以及(2)从浏览器复制完整的URL,但这无济于事。

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

错误:

Traceback: line 11, in google_results
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))
AttributeError: 'NoneType' object has no attribute 'text'

1 个答案:

答案 0 :(得分:0)

解决方案是添加标题(谢谢约翰):

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
    }
    html = requests.get(url, headers=headers).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

输出:

9280000000