Beautifulsoup无法从具有隐藏属性的标记中获取内容

时间:2017-03-09 01:55:17

标签: python beautifulsoup

<a id="ember1601" role="button" href="/carsearch/book?piid=AQAQAQRRg2INmYAyjZmAMwmKOGATj2qoYBQANIAVCeAZgB6fUEsAED&amp;totalPriceShown=71.66&amp;searchKey=-575257062&amp;offerQualifiers=GreatDeal" data-book-button="book-EY-EC-Car" target="_self" class="ember-view btn btn-secondary btn-action"><span class="btn-label">
    <span aria-hidden="true">
        <span class="visuallyhidden">
            Reserve Item 1, Economy from Economy Rent a Car Rental Company at $72 total
    </span>Reserve
    </span>

</span>
</a>

嗨,我是python的新手 我无法在<span class="visuallyhidden">下获得价格和72,我怎样才能在第一行的<a>标记中获得href链接,请帮助,谢谢 顺便说一下,我正在使用beautifulsoup lib,如果其他lib可以提供帮助,请告诉我。感谢

2 个答案:

答案 0 :(得分:1)

In [9]: soup = BeautifulSoup(html, 'lxml') # html is the code you posted

In [10]: soup.find("span", class_="visuallyhidden").text
Out[10]: '\n            Reserve Item 1, Economy from Economy Rent a Car Rental Company at $72 total\n    '

In [11]: soup.a["href"]
Out[11]: '/carsearch/book?piid=AQAQAQRRg2INmYAyjZmAMwmKOGATj2qoYBQANIAVCeAZgB6fUEsAED&totalPriceShown=71.66&searchKey=-575257062&offerQualifiers=GreatDeal'

如果您需要从字符串中提取部分文本,则需要使用正则表达式:

In [12]: text = soup.find("span", class_="visuallyhidden").text

In [15]: re.search(r'\$\d+', text).group()
Out[15]: '$72'

答案 1 :(得分:0)

beautifulsoup可以按类名称

查找标签
bs_obj = BeautifulSoup(html)
tag = bs_obj.find("span", class_ = "visuallyhidden") # string "class" is reserved for python itself,so bs use string "class_"
s = tag.string # that will get string inside the span
...
# you can get "$72" by regx

此外,bs允许您通过“[]”运算符访问标记的attr。就像

print(tag['href'])

您可以在bs doc online中看到一些简单的示例。