Question

Beautifulsoup对于python中的html解析很方便，而下面的代码结果就是cofuse me。

from bs4 import BeautifulSoup
tr ="""
<table>
    <tr class="passed" id="row1"><td>t1</td></tr>
    <tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    print row["class"]
    print row["id"]

结果：

[u'passed']
row1
[u'failed']
row2

为什么属性class作为数组返回？虽然id是正常值吗？

beautifulsoup4-4.5.0与python 2.7

一起使用

Answer 1

因为元素可能有多个类。

考虑这个例子：

来自bs4 import BeautifulSoup

tr ="""
<table>
    <tr class="passed a b c" id="row1"><td>t1</td></tr>
    <tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    print row["class"]
    print row["id"]

['passed', 'a', 'b', 'c']
row1
['failed']
row2

Answer 2

class是settings.py中的特殊multi-valued attribute：

HTML 4定义了一些可以包含多个值的属性。 HTML 5 删除了其中的几个，但定义了一些。最普遍的多值属性是BeautifulSoup（也就是说，标签可以有多个 CSS类）

有时，这是有问题的 - 例如，当您想要将正则表达式作为整体应用于class属性值时：

BeautifulSoup returns empty list when searching by compound class names

你可以turn this behavior off by tweaking the tree builder，但我不建议这样做。

Beautifulsoup返回属性“class”的列表，而其他属性的值

2 个答案: