<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
我正在使用Beautiful Soup在HTML代码中达到这一点。我现在想要搜索代码,并提取数据,如2010年1月,阿拉斯加,所有者和疯狗图。所有这些数据都具有相同的类别,但它们之前有不同的变量,如“成员自”,“AIGA章”等。我怎样才能搜索会员,从而获得2010年1月。对其他3个领域也一样吗?
答案 0 :(得分:3)
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
... ''')
>>> for row in soup.findAll('div', {'class':'profile-row clearfix'}):
... field, value = row.findAll(text = True)
... print field, value
...
Member Since January 2010
AIGA Chapter Alaska
Title Owner
Company Mad Dog Graphx
您当然可以使用field
和value
执行任何操作,例如使用它们创建dict或将它们存储在数据库中。
如果“profile-row clearfix”div中有其他div或其他文本节点,则需要执行field = row.find('div', {'class':'profile-row-header'}).findAll(text=True)
等操作。