大家好,我是一个初学者,但我想让自己更好地面对自己想到的项目。
因此,该项目应该是网页爬虫,它会通过页面列表并对其进行爬网。任务是保护列表中页面上找到的所有URL。我使用BeautifulSoup lib。然后,我有另一个列表,在这里我可以安全地手动输入术语或网址。虽然我的程序我检查是否在列表中。如果是,我想从第一个列表中打印链接,如果不是,则程序应该继续,但是我尝试了print(“ no”),以便对其进行调试。
我的代码到目前为止工作正常,但现在出现错误。由于循环,我猜它在代码的最下面。
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
links = []
terms = ['amazon', 'youtube']
main_url = ['https://www.amazon.de','https://www.youtube.com/']
##parses all the urls in the main_url list
for url in main_url:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
##adds all the links to the links list
for link in soup.find_all('a'):
links.append(link.get('href'))
##check to see if the term is on the website
for link in links:
for term in terms:
if term in link:
print(link)
else:
print("no")
错误代码
Traceback (most recent call last):
File "main.py", line 21, in <module>
if term in link:
TypeError: argument of type 'NoneType' is not iterable