Question

我不想再有一个电子邮件地址，使用此代码我得到了错误 TypeError：无法散列的类型：“列表” 所以我认为 allLinks = set（）是错误的，我必须使用元组而不是列表，对吗？

那是我的代码：

import requests
from bs4 import BeautifulSoup as soup
def get_emails(_links:list):

for i in range(len(_links)):
 new_d = soup(requests.get(_links[i]).text, 'html.parser').find_all('a', {'class':'my_modal_open'})
 if new_d:
   yield new_d[-1]['title']


start = 20
while True:
d = soup(requests.get('http://www.schulliste.eu/type/gymnasien/?bundesland=&start={page_id}'.format(page_id=start)).text, 'html.parser')
results = [i['href'] for i in d.find_all('a')][52:-9]
results = [link for link in results if link.startswith('http://')]



next_page=d.find('div', {'class': 'paging'}, 'weiter')

if next_page:

    start+=20

else:
    break

allLinks= set() 

if results not in allLinks:


    print(list(get_emails(results)))

    allLinks.add(results)

Answer 1

您正在尝试将整个电子邮件列表作为单个条目添加到set中。

您想要的是在单独的set条目中添加每封电子邮件。

问题出在这一行：

allLinks.add(results)

它将整个results列表添加为set中的单个元素，但这不起作用。改用它：

allLinks.update(results)

这将使用set中的元素更新list，但是每个元素将是set中的单独条目。

Answer 2

我可以使用它，但是我仍然收到重复的电子邮件。

    allLinks = []

if results not in allLinks:


    print(list(get_emails(results)))

    allLinks.append((results))

有人知道为什么吗？

Python抓取删除重复项

2 个答案: