我尝试使用请求从谷歌搜索一些数据,但它无法从网页返回所有内容。
代码:
import requests
from bs4 import BeautifulSoup
headers = {'user-agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
for idx, val in enumerate(soup.find_all('em'), 1):
print('{} = {}'.format(idx, val))
输出:
1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>Potato is good</em>
5 = <em>potato is good</em>
6 = <em>potato is good</em>
7 = <em>Potato is good</em>
8 = <em>potato is good</em>
9 = <em>potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato.is.good</em>
它仅显示了11个结果,但当我在谷歌上手动执行搜索时,有35个以上的结果。
我的代码可能出现什么问题?
答案 0 :(得分:3)
它是否可以返回结果,就像您通过移动设备搜索一样? 我刚试过,只能在iPhone的第一页上获得11个结果。也许一个不同的用户代理,如下所示,可以做到这一点?
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36
import requests
from bs4 import BeautifulSoup
headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
url = 'https://www.google.com/search?num=50&q="potato+is+good"'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
for idx, val in enumerate(soup.find_all('em'), 1):
print('{} = {}'.format(idx, val))
1 = <em>potato is good</em>
2 = <em>potato is good</em>
3 = <em>potato is good</em>
4 = <em>potato is good</em>
5 = <em>potatoÔÇØ is good</em>
6 = <em>potato is good</em>
7 = <em>potato-is good</em>
8 = <em>potato is good</em>
9 = <em>Potato is good</em>
10 = <em>potato is good</em>
11 = <em>potato is good</em>
12 = <em>potato is good</em>
13 = <em>potato is good</em>
14 = <em>potato is good</em>
15 = <em>potato is good</em>
16 = <em>potato is good</em>
17 = <em>Potato is good</em>
18 = <em>potato is good</em>
19 = <em>potato is good</em>
20 = <em>potato is good</em>
21 = <em>potato is good</em>
22 = <em>potato is good</em>
23 = <em>potato is good</em>
24 = <em>potato is good</em>
25 = <em>potato is good</em>
26 = <em>potato is good</em>
27 = <em>potato is good</em>
28 = <em>potato is good</em>
29 = <em>potato is good</em>
30 = <em>potato is good</em>
31 = <em>potato is good</em>
32 = <em>potato is good</em>
33 = <em>potato is good</em>
34 = <em>potato is good</em>
35 = <em>potato is good</em>
36 = <em>potato is good</em>