Question

获得源代码后，我有

[<div amy="sister" tommy="brother" julie="link1">E11</div>]
[<div amy="sister" tommy="brother" julie="link2_cat">E12</div>]
[<div amy="sister" tommy="brother" julie="link3_cat">E13</div>]

我想在朱莉中提取包含“_cat”的那些。我怎么能用find_all（attr）来做呢？

我试试

soup.find_all('div',{"julie":re.compile("_cat")})

但不起作用

Answer 1

import bs4

html = '''<div amy="sister" tommy="brother" julie="link1">E11</div>
<div amy="sister" tommy="brother" julie="link2_cat">E12</div>
<div amy="sister" tommy="brother" julie="link3_cat">E13</div>'''
soup = bs4.BeautifulSoup(html, 'lxml')

soup.find_all('div',{"julie":re.compile("_cat")})

出：

[<div amy="sister" julie="link2_cat" tommy="brother">E12</div>,
 <div amy="sister" julie="link3_cat" tommy="brother">E13</div>]

您应该在find_all()对象中使用soup，而不是在标记列表中。

Answer 2

如果您打算转到julie代码属性值，treat each matched tag as a dictionary：

In [5]: [tag["julie"] for tag in soup.find_all('div',{"julie":re.compile("_cat")})]
Out[5]: ['link2_cat', 'link3_cat']

还有一种更简洁的方式来匹配所需的元素 - CSS selectors：

In [6]: [tag["julie"] for tag in soup.select('div[julie$=_cat]')]
Out[6]: ['link2_cat', 'link3_cat']

$=选择器的意思是“以...结尾。”

Beautifulsoup在attrs中搜索关键字

2 个答案: