Question

有人知道为什么会出现此错误吗？

MissingSchema: Invalid URL '/type/gymnasien/': No schema supplied. Perhaps you meant http:///type/gymnasien/?

这是我的代码：

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list, _r = [0, 10]):
for i in range(*_r):
 new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
 if new_d:
   yield new_d[-1]['title']

d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/').text,   'html.parser')
results = [i['href'] for i in d.find_all('a')][52:-9]
print(list(get_emails(results)))

Answer 1

因此，据我从您的代码中了解到的，您正在寻找到一堆学校的链接，然后使用get_emails()函数来跟踪这些链接，并搜寻学校的联系电子邮件。如果您查看传递给results的{{1}}列表中的内容，您会发现它包含get_emails()不知道如何处理的网站内部的一些相对链接：

requests

这些链接可能甚至都不是您想要遵循的链接，因此您可以做的是，在将它们传递给>>> print(results[1]) /type/gymnasien/函数之前，先尝试将它们从已删除链接的列表中删除：

get_emails()

然后，您可以在下游使用这些结果，results_filtered = [link for link in results if link.startswith('http://')]应该不再抱怨get_emails()。最终代码如下所示：

MissingSchema

输出以下输出：

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list, _r = [0, 10]):
    for i in range(*_r):
     new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
     if new_d:
       yield new_d[-1]['title']

d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/').text, 'html.parser')
results = [i['href'] for i in d.find_all('a')][52:-9]
results = [link for link in results if link.startswith('http://')]
print(list(get_emails(results)))

Python抓取错误MissingSchema：无效的URL

1 个答案: