Question

<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
<div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>

我正在使用Beautiful Soup在HTML代码中达到这一点。我现在想要搜索代码，并提取数据，如2010年1月，阿拉斯加，所有者和疯狗图。所有这些数据都具有相同的类别，但它们之前有不同的变量，如“成员自”，“AIGA章”等。我怎样才能搜索会员，从而获得2010年1月。对其他3个领域也一样吗？

Answer 1

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<div class="profile-row clearfix"><div class="profile-row-header">Member Since</div><div class="profile-information">January 2010</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">AIGA Chapter</div><div class="profile-information">Alaska</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Title</div><div class="profile-information">Owner</div></div>
... <div class="profile-row clearfix"><div class="profile-row-header">Company</div><div class="profile-information">Mad Dog Graphx</div></div>
... ''')
>>> for row in soup.findAll('div', {'class':'profile-row clearfix'}):
...  field, value = row.findAll(text = True)
...  print field, value
... 
Member Since January 2010
AIGA Chapter Alaska
Title Owner
Company Mad Dog Graphx

您当然可以使用field和value执行任何操作，例如使用它们创建dict或将它们存储在数据库中。

如果“profile-row clearfix”div中有其他div或其他文本节点，则需要执行field = row.find('div', {'class':'profile-row-header'}).findAll(text=True)等操作。

使用Python和Beautiful Soup解析HTML

1 个答案: