如何使用漂亮的汤从这个html代码中取消一年

时间:2019-11-27 08:16:09

标签: python-3.x beautifulsoup

<div class="d-flex flex-column flex-sm-row justify-content-sm-start align-items-sm-center justify-content-start align-items-center card box-shadow RankItem">
<div class="d-flex flex-column justify-content-center align-items-center LeftSection">
<div class="rank RankNumber"><span>#</span>10</div>
<div class="score">SCORE 7.597</div>
<span class="ChgUp" style="display:none !important;"><i aria-hidden="" class="fas fa-arrow-circle-up" title="up"></i></span>
<span class="ChgDown" style="display:none !important;"><i aria-hidden="" class="fas fa-arrow-circle-down" title="down"></i></span>
<span class="d-flex flex-row align-items-center ChgNeutral" style="display:none !important;">
<i aria-hidden="" class="fa-stack fa-2x" title="no change">
<i class="fas fa-circle fa-stack-2x"></i>
<i class="fal fa-arrows-h fa-stack-1x fa-inverse"></i>
</i>
</span>
<span class="d-flex flex-row align-items-center">
<i aria-hidden="" class="fa-stack fa-2x" title="no change">
<i class="fas fa-circle fa-stack-2x"></i>
<i class="fal fa-arrows-h fa-stack-1x fa-inverse"></i>
</i>

2019 Rank 10                                            </span>
</div>

我想使用漂亮的汤从该页面来源中删除“ 2019”。 我只想要数字2019.请任何人帮助

1 个答案:

答案 0 :(得分:1)

以下是我检查完先前的问题并亲自找到您要通过此链接https://www.vault.com/best-companies-to-work-for/law/top-100-law-firms-rankings/year/2020达到的目标后的答案。

from bs4 import BeautifulSoup

html = """
<span class="d-flex flex-row align-items-center">
                                                    <i class="fa-stack fa-2x" aria-hidden="" title="no change">
                                                        <i class="fas fa-circle fa-stack-2x"></i>
                                                        <i class="fal fa-arrows-h fa-stack-1x fa-inverse"></i>
                                                    </i>
2019 Rank 10                                            </span>
"""

soup = BeautifulSoup(html, 'html.parser')

for item in soup.findAll('span', attrs={'class': 'd-flex flex-row align-items-center'}):
    item = item.text
    print(item.strip()[0:4])

输出:

2019