尝试抓取谷歌搜索结果。这个代码适用于所有其他网站,我试过,但不与谷歌合作。它返回一个空列表。
from BeautifulSoup import BeautifulSoup
import requests
def googlecrawler(search_term):
url="https://www.google.co.in/?gfe_rd=cr&ei=UVSeVZazLozC8gfU3oD4DQ&gws_rd=ssl#q="+search_term
junk_code=requests.get(url)
ok_code=junk_code.text
good_code=BeautifulSoup(ok_code)
best_code=good_code.findAll('h3',{'class':'r'})
print best_code
googlecrawler("healthkart")
它应该返回这样的东西。
<h3 class="r"><a href="/url? sa=t&rct=j&q=&esrc=s&source=web&cd=6&cad=rja&uact=8&ved=0CEIQFjAF&url=http%3A%2F%2Fwww.coupondunia.in%2Fhealthkart&ei=qFmfVc2fFNO0uASti4PwDQ&usg=AFQjCNFHMzqn-rH4Hp-fZK0E4wwxJmevEg&sig2=QgwxMBdbPndyQTSH10dV2Q" onmousedown="return rwt(this,'','','','6','AFQjCNFHMzqn-rH4Hp-fZK0E4wwxJmevEg','QgwxMBdbPndyQTSH10dV2Q','0CEIQFjAF','','',event)" data-href="http://www.coupondunia.in/healthkart">HealthKart Coupons: July 2015 Coupon Codes</a></h3>
答案 0 :(得分:0)
查看good_code
我根本看不到h3
或class "r"
。这就是你的代码返回一个空列表的原因。
您的代码没有问题,而是您正在搜索的内容不存在。
你期待什么回来?
答案 1 :(得分:0)
一种常见的解决方案是在您的请求中添加 user-agent
aka headers
以伪造真实用户访问:
# https://www.whatismybrowser.com/guides/the-latest-user-agent/
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
所以你的代码看起来像这样:
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
def googlecrawler(search_term):
html = requests.get(f'https://www.google.com/search?q=', headers=headers).text
soup = BeautifulSoup(html, 'lxml')
for container in soup.findAll('div', class_='tF2Cxc'):
title = container.select_one('.DKV0Md').text
link = container.find('a')['href']
print(f'{title}\n{link}')
googlecrawler('site:Facebook.com Dentist gmail.com')
# part of the output:
'''
COVID-19 Office Update Dear... - Canton Dental Associates ...
https://www.facebook.com/permalink.php?id=107567882605441&story_fbid=3205134459515419
Spinelli Dental - General Dentist - Rochester, New York ...
https://www.facebook.com/spinellidental/about/?referrer=services_landing_page
LaboSmile USA Dentist & Dental Office in Delray ... - Facebook
https://www.facebook.com/labosmileusa/
'''
或者,您可以使用来自 SerpApi 的 Google Search Engine Results API 来实现。这是一个付费 API,可免费试用 5,000 次搜索。
要集成的代码:
from serpapi import GoogleSearch
import os, json, re
params = {
"engine": "google",
"q": "site:Facebook.com Dentist gmail.com",
"api_key": os.getenv('API_KEY')
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
title = result['title']
link = result['link']
print(f'{title}\n{link}\n')
# part of the output:
'''
Green Valley Dental - About | Facebook
https://www.facebook.com/GVDFamily/about/
My Rivertown Dentist - About | Facebook
https://www.facebook.com/Rivertownfamily/about/
COVID-19 Office Update Dear... - Canton Dental Associates ...
https://www.facebook.com/permalink.php?id=107567882605441&story_fbid=3205134459515419
'''
<块引用>
免责声明,我为 SerpApi 工作。