Question

“我很幸运！”电子书“使用Python自动处理无聊的东西”不再适用于他提供的代码。

具体来说，linkElems = soup.select（'。r a'）

我已经尝试使用以下提供的解决方案： soup.select('.r a') in 'https://www.google.com/#q=vigilante+mic' gives empty list in python BeautifulSoup

，并且我目前使用的是相同的搜索格式。

import webbrowser, requests, bs4

def im_feeling_lucky():

    # Make search query look like Google's
    search = '+'.join(input('Search Google: ').split(" "))

    # Pull html from Google
    print('Googling...') # display text while downloading the Google page
    res = requests.get(f'https://google.com/search?q={search}&oq={search}')
    res.raise_for_status()

    # Retrieve top search result link
    soup = bs4.BeautifulSoup(res.text, features='lxml')


    # Open a browser tab for each result.
    linkElems = soup.select('.r')  # Returns empty list
    numOpen = min(5, len(linkElems))
    print('Before for loop')
    for i in range(numOpen):
        webbrowser.open(f'http://google.com{linkElems[i].get("href")}')

linkElems变量返回一个空列表[]，程序不执行任何操作。

Answer 1

我走了一条不同的路。我保存了请求中的HTML并打开了该页面，然后检查了元素。事实证明，与我的python请求相比，如果我在Chrome浏览器中本机打开该页面，则该页面是不同的。我用似乎表示结果的类标识了div，并为.r作了补充-在我的情况下是.kCrYT

#! python3

# lucky.py - Opens several Google Search results.

import requests, sys, webbrowser, bs4

print('Googling...') # display text while the google page is downloading

url= 'http://www.google.com.au/search?q=' + ' '.join(sys.argv[1:])
url = url.replace(' ','+')


res = requests.get(url)
res.raise_for_status()


# Retrieve top search result links.
soup=bs4.BeautifulSoup(res.text, 'html.parser')


# get all of the 'a' tags afer an element with the class 'kCrYT' (which are the results)
linkElems = soup.select('.kCrYT > a') 

# Open a browser tab for each result.
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open_new_tab('http://google.com.au' + linkElems[i].get('href'))

Answer 2

我在阅读那本书时也遇到了同样的问题，并且找到了解决该问题的方法。

替换

soup.select('.r a')

与

soup.select('div#main > div > div > div > a')

将解决该问题

以下是将起作用的代码

import webbrowser, requests, bs4 , sys

print('Googling...')
res = requests.get('https://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text)

linkElems = soup.select('div#main > div > div > div > a')  
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('http://google.com' + linkElems[i].get("href"))

上面的代码从命令行参数中获取输入

Answer 3

不同的网站（例如Google）为不同的User-Agent生成不同的HTML代码（这是网站识别Web浏览器的方式）。解决该问题的另一种方法是使用浏览器用户代理，以确保从网站获得的HTML代码与在浏览器中使用“查看页面源代码”获得的HTML代码相同。以下代码仅显示google搜索结果网址的列表，与您所参考的书不同，但是显示要点仍然很有用。

#! python3
# lucky.py - Opens several Google search results.

import requests, sys, webbrowser, bs4
print('Please enter your search term:')
searchTerm = input()
print('Googling...')    # display thext while downloading the Google page

url = 'http://google.com/search?q=' + ' '.join(searchTerm)
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

res = requests.get(url, headers=headers)
res.raise_for_status()


# Retrieve top search results links.
soup = bs4.BeautifulSoup(res.content)

# Open a browser tab for each result.
linkElems = soup.select('.r > a')   # Used '.r > a' instead of '.r a' because
numOpen = min(5, len(linkElems))    # there are many href after div class="r"
for i in range(numOpen):
  # webbrowser.open('http://google.com' + linkElems[i].get('href'))
  print(linkElems[i].get('href'))

f'https：//google.com/search？q = {query}'中的soup.select（'。r a'）带回Python BeautifulSoup中的空列表。不重复

3 个答案:

f'https：//google.com/search？q = {query}'中的soup.select（'。r a'）带回Python BeautifulSoup中的空列表。 **不重复**

3 个答案:

f'https：//google.com/search？q = {query}'中的soup.select（'。r a'）带回Python BeautifulSoup中的空列表。不重复