Question

我正在尝试获取google搜索的搜索结果数量，如果我只是从浏览器中保存的话，它就会出现在html中：

<div id="resultStats">About 8,660,000,000 results<nobr> (0.49 seconds)&nbsp;</nobr></div>

但是，当我在浏览器中打开通过python检索的HTML时，它看起来像是一个移动网站，并且其中不包含“ resultStats”。

我已经尝试（1）向https://www.google.com/search?client=firefox-b-d&q=test之类的URL添加参数，以及（2）从浏览器复制完整的URL，但这无济于事。

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

错误：

Traceback: line 11, in google_results
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))
AttributeError: 'NoneType' object has no attribute 'text'

Answer 1

解决方案是添加标题（谢谢约翰）：

import requests
from bs4 import BeautifulSoup
import re

def google_results(query):
    url = 'https://www.google.com/search?q=' + query
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
    }
    html = requests.get(url, headers=headers).text
    soup = BeautifulSoup(html, 'html.parser')
    div = soup.find('div', id='resultStats')
    return int(''.join(re.findall(r'\d+', div.text.split()[1])))

print(google_results('test'))

输出：

9280000000

google search html不包含div id ='resultStats'

1 个答案: