我正在尝试解析g-inner-card class = "_KBh"
,但由于某种原因,它会返回一个空元组。 linkElems = soup.select('._KBh a')
linkElems = soup.select('._KBh a')
print(linkElems)
这将返回一个空元组[]
。
import webbrowser, sys, pyperclip, requests, bs4
if len(sys.argv) > 1:
term = ' '.join(sys.argv[1:])
else:
term = pyperclip.paste()
res = requests.get("https://www.google.com/search?q="+term)
try:
res.raise_for_status()
except Exception as ex:
print('There was a problem: %s' %(ex), '\nSorry!!')
soup = bs4.BeautifulSoup(res.text, "html.parser")
linkElems = soup.select('._KBh a')
print(linkElems)
numOpen = min(3, len(linkElems))
for i in range(numOpen):
print(linkElems[i].get('href'))
webbrowser.open('https://google.com/' + linkElems[i].get('href'))
当输入命令行参数(即要搜索的字词)时,此代码段尝试在浏览器的3个不同窗口中打开最多3个Google搜索结果。它专门显示Google内卡的结果。
答案 0 :(得分:0)
如果您打印res.text
,则可以看到您没有从页面获取完整/正确的数据。发生这种情况是因为Google阻止了Python脚本。
要解决此问题,您可以传递User-Agent
以使脚本看起来像真正的浏览器。
默认User-Agent
的结果:
>>> URL = 'https://www.google.co.in/search?q=federer'
>>> res = requests.get(URL)
>>> '_KBh' in res.text
False
添加自定义User-Agent
后:
>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
>>> res = requests.get(URL, headers=headers)
>>> '_KBh' in res.text
True
将headers
添加到您的代码中,会提供以下输出(您要查找的前3个链接):
https://www.express.co.uk/sport/tennis/918251/Roger-Federer-Felix-Auger-Aliassime-practice
https://sports.yahoo.com/breaks-lighter-schedules-help-players-improve-says-federer-092343458--ten.html
http://www.news18.com/news/sports/rafael-nadal-stays-atop-atp-rankings-roger-federer-aims-to-overtake-1658665.html