Question

大家好，我是一个初学者，但我想让自己更好地面对自己想到的项目。

因此，该项目应该是网页爬虫，它会通过页面列表并对其进行爬网。任务是保护列表中页面上找到的所有URL。我使用BeautifulSoup lib。然后，我有另一个列表，在这里我可以安全地手动输入术语或网址。虽然我的程序我检查是否在列表中。如果是，我想从第一个列表中打印链接，如果不是，则程序应该继续，但是我尝试了print（“ no”），以便对其进行调试。

我的代码到目前为止工作正常，但现在出现错误。由于循环，我猜它在代码的最下面。

from bs4 import BeautifulSoup
import requests


headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

links = []
terms = ['amazon', 'youtube']
main_url = ['https://www.amazon.de','https://www.youtube.com/']

##parses all the urls in the main_url list
for url in main_url:
  response = requests.get(url, headers=headers)
  soup = BeautifulSoup(response.text, "html.parser")
##adds all the links to the links list
  for link in soup.find_all('a'):
    links.append(link.get('href'))
##check to see if the term is on the website
for link in links:
  for term in terms:
   if term in link:
    print(link)
  else: 
    print("no")

错误代码

Traceback (most recent call last):
  File "main.py", line 21, in <module>
    if term in link:
TypeError: argument of type 'NoneType' is not iterable

搜索特定网址的网页搜寻器

0 个答案: