网络抓取Google和打印搜索结果#

时间:2018-10-04 21:39:19

标签: python beautifulsoup

from urllib2 import urlopen as open
from urllib2 import Request as request
from bs4 import BeautifulSoup as soup
import lxml

agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0' 
headers = {
    'User-Agent': agent 
}
url = 'https://www.google.com/search?q=gemini+horoscope'
r = request(url, headers=headers)
p = open(r) 
sauce = soup(p,'lxml')
res = sauce.find('div',{'id':'resultStats'})
print res.read()

-我觉得自己做的事情超出了我的需要,任何帮助将不胜感激!该程序应该很简单,只打印搜索结果编号,但由于某种原因,它无法正常工作> _ << / p>

2 个答案:

答案 0 :(得分:0)

您超级亲密。代替

--all

尝试

print res.read()

它应该给您:

print res.text

如果只需要数字,请尝试

In [2]: print res.text
About 36,100,000 results (0.30 seconds)

答案 1 :(得分:0)

从urllib2中以打开方式导入urlopen

从urllib2导入请求作为请求

从bs4导入BeautifulSoup作为汤

导入lxml

agent ='Mozilla / 5.0(X11; Ubuntu; Linux x86_64; rv:59.0)Gecko / 20100101

Firefox / 59.0” 标头= {

'User-Agent': agent 

} url ='https://www.google.com/search?q=gemini+horoscope'

r = request(url,headers = headers)

p =打开(r)

酱=汤(p,'lxml')

res = sauce.find('div',{'id':'resultStats'})

打印res.text.split('')[1]