我正在尝试搜索Google的某些产品,但是Google返回的结果语言取决于代理,我尝试使用标题中的'accept-language': 'en-US,en;q=0.9'
对其进行修复,但仍然没有用
import requests
from bs4 import BeautifulSoup
products=["Majestic Pet Stairs Steps","Ball Jars Wide Mouth Lids 12/Pack","LED Duck Color Changing Floating Speaker"]
for product in products:
headers = {
'authority': 'www.google.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
'accept-language': 'en-US,en;q=0.9'}
url = 'https://google.com/search?q={}'.format(product)
PROXY = None
res=requests.get(url,headers=headers,proxies=PROXY)
if res.status_code!=200:
print("bad proxy")
break
soup = BeautifulSoup(res.text,"lxml")
print(soup.title.text)
我想要的是始终以英语获取结果(无论代理如何)
答案 0 :(得分:1)
它们提供了用于搜索的API:https://developers.google.com/custom-search/v1/overview
如果您通过网络抓取进行大量自动查询,他们很可能会开始设置验证码或屏蔽您。
答案 1 :(得分:1)
我有一个方便的库供我搜索,这是我的应用程序中的一个片段:
通过点子安装Google进行安装,RFC
from googlesearch import search
results = list(search(str(tag)+' '+str(intitle), domains = ['stackoverflow.com'], stop = SITE.page_size))
答案 2 :(得分:0)
您是否尝试在请求链接中放置 uule=location
、hl=en
或 lr=lang_eng
参数?
response = request.get(`https://google.com/search?q=FUS RO DAH&hl=en`)
或者使用参数 dict
params = {
'q': 'FUS RO DAH',
'hl': 'en', # the language to use for the Google search
'gl': 'us' # the country to use for the Google search
'lr': 'lang_en' # one or multiple languages to limit the search to
'uule': 'w+CAIQICIGQnJhemls' #Brazil # defines encoded location you want to use for the search
}
import requests
from bs4 import BeautifulSoup
headers = {
'user-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
}
products = ["Majestic Pet Stairs Steps", "Ball Jars Wide Mouth Lids 12/Pack", "LED Duck Color Changing Floating Speaker"]
for product in products:
params = {
'q': f'{product}',
'hl': 'en',
'gl': 'us'
'lr': 'lang_en'
}
html = requests.get(f'https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'html.parser')
print(soup)
或者,您可以使用来自 SerpApi 的 Google Search Engine Results API 来做同样的事情。这是一个付费 API,可免费试用 5,000 次搜索。查看playground。
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "spotlight 29 casino address",
"google_domain": "google.com.br",
"gl": "br",
"hl": "pt",
"uule": "w+CAIQICIGQnJhemls", # can't be used together with location
}
<块引用>
免责声明,我为 SerpApi 工作。