所以我试图创建一个Python脚本来获取搜索词或查询,然后搜索谷歌搜索该词。然后它应该从搜索词的结果中返回5个URL。
我花了很多时间试图让PyGoogle工作。但后来发现Google不再支持SOAP API进行搜索,也不提供新的许可证密钥。简而言之,PyGoogle在这一点上已经非常死了。
所以我的问题是......最简洁/最简单的方法是什么?
我想在Python中完全这样做。
感谢您的帮助
答案 0 :(得分:1)
使用BeautifulSoup并请求从Google搜索结果中获取链接
import requests
from bs4 import BeautifulSoup
keyword = "Facebook" #enter your keyword here
search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + keyword
r = requests.get(search)
soup = BeautifulSoup(r.text, "html.parser")
container = soup.find('div',{'id':'search'})
url = container.find("cite").text
print(url)
答案 1 :(得分:0)
您对pygoogle有什么问题?我知道它不再受支持了,但是我已经在很多场合使用过该项目,并且它可以很好地完成你所描述的琐碎任务。
你的问题确实让我感到好奇 - 所以我去了谷歌并输入了" python google search"。 Bam,发现this repository。安装了点子,在浏览他们的文档后5分钟内得到了你的要求:
import google
for url in google.search("red sox", num=5, stop=1):
print(url)
也许下次再尝试小,好吗?
答案 2 :(得分:0)
此处,link是 xgoogle图书馆也是如此。
我尝试过类似的前10个链接,这也计算我们定位的链接中的字词。我已添加代码段供您参考:
import operator
import urllib
#This line will import GoogleSearch, SearchError class from xgoogle/search.py file
from xgoogle.search import GoogleSearch, SearchError
my_dict = {}
print "Enter the word to be searched : "
#read user input
yourword = raw_input()
try:
#This will perform google search on our keyword
gs = GoogleSearch(yourword)
gs.results_per_page = 80
#get google search result
results = gs.get_results()
source = ''
#loop through all result to get each link and it's contain
for res in results:
#print res.url.encode('utf8')
#this will give url
parsedurl = res.url.encode("utf8")
myurl = urllib.urlopen(parsedurl)
#above line will read url content, in below line we parse the content of that web page
source = myurl.read()
#This line will count occurrence of enterd keyword in our webpage
count = source.count(yourword)
#We store our result in dictionary data structure. For each url, we store it word occurent. Similar to array, this is dictionary
my_dict[parsedurl] = count
except SearchError, e:
print "Search failed: %s" % e
print my_dict
#sorted_x = sorted(my_dict, key=lambda x: x[1])
for key in sorted(my_dict, key=my_dict.get, reverse=True):
print(key,my_dict[key])