我有一个htmml页面作为汤'a'。在该页面上,我有兴趣在包含文本“ AFT”(不区分大小写)的标记下找到hreff。 这样做:
>>> rows = a.findAll('span', attrs={'class': 'views-field views-field-title'})
输出为:
[<span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-btfs" hreflang="en">30 October 2020: AFT’s next issuance of BTFs: Monday 02 November 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-oats" hreflang="en">30 October 2020: BFT’s next issuance of long-term OATs: Thursday 05 November 2020</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201026-issuance-btfs" hreflang="en">26 October 2020: AFT's issuance: 5.289 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201023-next-issuance-btfs" hreflang="en">23 October 2020: AFT’s next issuance of BTFs: Monday 26 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201019-issuance-btfs" hreflang="en">19 October 2020: AFT's issuance: 5.489 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201016-next-issuance-btfs" hreflang="en">16 October 2020: AFT’s next issuance of BTFs: Monday 19 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201015-next-issuance-inflation-indexed-oats" hreflang="en">15 October 2020: AFT’s issuance: 1.000 billion euros of inflation-indexed OATs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201015-issuance-oats" hreflang="en">15 October 2020: AFT’s issuance: 7.240 billion euros of medium-term OATs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201012-issuance-btfs" hreflang="en">12 October 2020: AFT's issuance: 5.288 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-indexed-oats" hreflang="en">09 October 2020: AFT’s next issuance of inflation-indexed OATs: Thursday 15 October 2020</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-btfs" hreflang="en">09 October 2020: AFT’s next issuance of BTFs: Monday 12 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-oats" hreflang="en">09 October 2020: AFT’s next issuance of medium-term OATs: Thursday 15 October 2020</a>
</span></span>]
所以从上面我想要除此(列表的第二个元素)内的一个以外的所有hreff,因为它不包含“ AFT”
<span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-oats" hreflang="en">30 October 2020: BFT’s next issuance of long-term OATs: Thursday 05 November 2020</a>
</span></span>
有人可以帮忙从rows
提取hreff作为列表还是可以从a
提取hreff?
谢谢。
答案 0 :(得分:1)
href = [row.find('a').get('href') for row in rows if 'AFT' in row.text]
print(href)
输出
['/index.php/en/publications/communiques-presse/20201030-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201026-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201023-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201019-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201016-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201015-next-issuance-inflation-indexed-oats',
'/index.php/en/publications/communiques-presse/20201015-issuance-oats',
'/index.php/en/publications/communiques-presse/20201012-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-indexed-oats',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-oats']
答案 1 :(得分:0)
您可以根据需要编写自定义查找器功能。
.my-modal {
border: solid 4px blue;
}
.my-modal .modal-header {
background-color: lime;
}
.my-modal .modal-body {
background-color: orange;
}
另一种写法是:
def aft_tag(tag):
return tag.get('href') and 'AFT' in tag.text
for tag in soup.find_all(aft_tag):
print(tag.get('href'))
答案 2 :(得分:0)
要查找包含href
的{{1}},可以使用CSS选择器AFT
:
contains(<my text>)
输出:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_snippet, "html.parser")
# Select the class `views-field views-field-title` and `a` which contains the text `AFT`
for tag in soup.select(".views-field.views-field-title a:contains(AFT)"):
print(tag['href'])