BeautifulSoup在给定类的div中获取所有不同的属性值

时间:2018-12-16 19:58:17

标签: python html beautifulsoup

假设我的html文件具有这样的div:

<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>

如何获取所有发送邮件的用户列表?

如果我使用find方法,我只会获得第一位用户,如果我使用find_all,我将获得两次user1

我能以某种方式做到这一点,而又不删除find_all创建的列表中的重复项吗?

2 个答案:

答案 0 :(得分:1)

这是我只能想到的两种方法:

import bs4

r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''

soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})

users_list = []   

for user in messages:
    user_id = user.get('title')
    if user_id not in users_list:
        users_list.append(user_id)

import bs4

r = '''<div class="message" title="user1"> <span> Hey </span> </div>
<div class="message" title="user1"> <span> It's me </span> </div>
<div class="message" title="user2"> <span> Hi </span> </div>
<div class="message" title="user3"> <span> Ola </span> </div>'''

soup = bs4.BeautifulSoup(r,'html.parser')
messages = soup.find_all('div', {'class':'message'})

users_list = list(set([ user.get('title') for user in messages ]))

答案 1 :(得分:1)

您可以使用自定义查找器功能

seen_users = set()
def users(tag):
    username = tag.get('title')
    if username and 'message' in tag.get('class', ''):
        seen_users.add(username)
        return True

tags = soup.find_all(users)
print(seen_users)  # {'user1', 'user2', 'user3'}