所以昨天我的程序正在运行,我保存并关闭了它,但现在不行了。第一个for循环应该附加来自Google搜索的网站链接,而现在它根本没有运行循环
def Google(word):
linkelem = []
strlink = []
httplink = []
extractedhttp = []
brokenlinks = []
websiteheadlines = []
websitebody = []
res2 = requests.get(f'https://google.com/search?q={word}')
res2.raise_for_status()
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
#print(soup2)
for div in soup2.find_all("div", {"class": "jfp3ef"}):
for link in div.select("a"):
linkelem.append(link)
我需要将链接添加到列表“ linkelem”中
这是不起作用的部分,还有很多,但全部都依赖于第一部分来工作。如果我可以的话,我需要添加其余的内容。我试图在for循环中添加打印语句,但没有打印出来。在那之后我不知道该怎么办。
答案 0 :(得分:1)
Havenard 建议的最明显的一个是类已更改。此外,这可能是因为您的请求没有 user-agent
来伪造真实的用户访问。 List of user agents。
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
您始终可以通过 if
语句检查该类是否存在:
# scrapes all titles from page result
# try to remove one letter from CSS selector and it will print "Nothing has been found."
if soup.select('.DKV0Md'):
print('Found elements:')
for result in soup.select('.DKV0Md'):
print(result.text)
else:
print('Nothing has been found.')
# output:
'''
Found elements:
Minecraft Official Site | Minecraft
Minecraft - Wikipedia
Minecraft - Apps on Google Play
Minecraft - YouTube
'''
代码:
import requests, lxml
from bs4 import BeautifulSoup
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {'q': 'Minecraft'}
html = requests.get('https://www.google.com/search', headers=headers, params=params).text
soup = BeautifulSoup(html, 'lxml')
if soup.select('.DKV0Md'):
print('Found elements:')
for result in soup.select('.DKV0Md'):
print(result.text)
else:
print('Nothing has been.')
或者,您也可以使用 SerpApi 中的 Google Search Engine Results API 来实现。这是一个付费 API,可免费试用 5,000 次搜索。
要集成的代码:
from serpapi import GoogleSearch
import os
params = {
"api_key": os.getenv("API_KEY"),
"engine": "google",
"q": "Minecraft",
"hl": "en",
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
# try/except is better is this case.
# If nothing has been found, it will just print 'Nothing has been found.'
try:
print('Found elements:')
title = result['title']
print(title)
except:
print('Nothing has been found.')
<块引用>
免责声明,我为 SerpApi 工作。