使用Beautifulsoup获取特定属性

时间:2019-01-25 06:53:13

标签: python html beautifulsoup

我想使用beautifulsoup从HTML标记中提取属性。怎么做?

例如:

<div class="search-pagination-top clearfix  mtop ">
                                            <div class="row"><div class="col-l-4 mtop pagination-number" tabindex="0"
aria-label="Page 1 of 15 "><div>Page <b>1</b> of <b>15</b> </div></div>

如何从“ aria-label”属性获取文本?

我尝试使用select(),但没有帮助。

2 个答案:

答案 0 :(得分:0)

您可以像字典一样提取属性值。使用键aria-label

例如:

from bs4 import BeautifulSoup

html = """<div class="search-pagination-top clearfix  mtop ">
                                            <div class="row"><div class="col-l-4 mtop pagination-number" tabindex="0"
aria-label="Page 1 of 15 "><div>Page <b>1</b> of <b>15</b> </div></div>
"""

soup = BeautifulSoup(html, "html.parser")
print( soup.find("div", class_="col-l-4 mtop pagination-number")["aria-label"] )

输出:

Page 1 of 15 

答案 1 :(得分:0)

from bs4 import BeautifulSoup

html_doc = """
<div class="search-pagination-top clearfix  mtop ">
                                            <div class="row"><div class="col-l-4 mtop pagination-number" tabindex="0"
aria-label="Page 1 of 15 "><div>Page <b>1</b> of <b>15</b> </div></div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

print(soup.div.div.text.strip())

第1页,共15页