(我已经尝试过但所有其他答案似乎都在使用urllib2)
我刚开始尝试使用请求,但我还不太清楚如何从页面发送或请求其他内容。例如,我有
import requests
r = requests.get('http://google.com')
但我不知道现在怎么做,例如,使用提供的搜索栏进行谷歌搜索。我已经阅读了快速入门指南,但我对HTML POST等不太熟悉,因此它没有帮助。
是否有干净优雅的方式来做我要求的事情?
答案 0 :(得分:8)
import requests
from bs4 import BeautifulSoup
headers_Get = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1'
}
def google(q):
s = requests.Session()
q = '+'.join(q.split())
url = 'https://www.google.com/search?q=' + q + '&ie=utf-8&oe=utf-8'
r = s.get(url, headers=headers_Get)
soup = BeautifulSoup(r.text, "html.parser")
output = []
for searchWrapper in soup.find_all('h3', {'class':'r'}): #this line may change in future based on google's web page structure
url = searchWrapper.find('a')["href"]
text = searchWrapper.find('a').text.strip()
result = {'text': text, 'url': url}
output.append(result)
return output
将以{' text':text,' url':url}格式返回一系列Google搜索结果。最高结果网址为google('search query')[0]['url']
答案 1 :(得分:7)
请求概述
Google搜索请求是标准HTTP GET命令。它包含一系列与您的查询相关的参数。这些参数作为名称=值对包含在请求网址中,以&符号(&)字符分隔。参数包括搜索查询等数据和标识发出HTTP请求的CSE的唯一CSE ID(cx)。 WebSearch或Image Search服务返回XML结果以响应您的HTTP请求。
首先,您必须在Control Panel of Custom Search Engine
获取您的CSE ID(cx参数)然后,See the official Google Developers site for Custom Search.
有很多这样的例子:
http://www.google.com/search?
start=0
&num=10
&q=red+sox
&cr=countryCA
&lr=lang_fr
&client=google-csbe
&output=xml_no_dtd
&cx=00255077836266642015:u-scht7a-8i
并且解释了您可以使用的参数列表。
答案 2 :(得分:2)
<强>输入强>
import requests
def googleSearch(query):
with requests.session() as c:
url = 'https://www.google.co.in'
query = {'q': query}
urllink = requests.get(url, params=query)
print urllink.url
googleSearch('Linkin Park')
<强>输出:强>
https://www.google.co.in/?q=Linkin+Park
答案 3 :(得分:0)
在这段代码中,通过使用 bs4
,您可以获得所有 h3
和 print
的文本
# Import the beautifulsoup
# and request libraries of python.
import requests
import bs4
# Make two strings with default google search URL
# 'https://google.com/search?q=' and
# our customized search keyword.
# Concatenate them
text= "c++ linear search program"
url = 'https://google.com/search?q=' + text
# Fetch the URL data using requests.get(url),
# store it in a variable, request_result.
request_result=requests.get( url )
# Creating soup from the fetched request
soup = bs4.BeautifulSoup(request_result.text,"html.parser")
filter=soup.find_all("h3")
for i in range(0,len(filter)):
print(filter[i].get_text())