Python,美丽的汤,得到所有的类名

时间:2017-05-03 05:26:01

标签: python html class beautifulsoup

给出一个html代码,让我们说:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>

如何检索所有类名?即:['class1','class2','class3','class4']

我试过了:

soup.find_all(class_=True)

但它检索整个标签,然后我需要对字符串

做一些正则表达式

1 个答案:

答案 0 :(得分:3)

在检索属性时,您可以treat each Tag instance found as a dictionary。请注意,class属性值将是列表,因为class是特殊的"multi-valued" attribute

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

或者:

classes = [value 
           for element in soup.find_all(class_=True) 
           for value in element["class"]]

<强>演示:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """
   ...: <div class="class1">
   ...:     <span class="class2">some text</span>
   ...:     <span class="class3">some text</span>
   ...:     <span class="class4">some text</span>
   ...: </div>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: classes = [value
   ...:            for element in soup.find_all(class_=True)
   ...:            for value in element["class"]]

In [5]: print(classes)
['class1', 'class2', 'class3', 'class4']