BeautifulSoup .find()捕获太多文本(如何缩小文本范围?)

时间:2018-10-12 22:00:46

标签: python-3.x beautifulsoup

想知道如何在下面的html上定位“ Switch”文本:

    <div class="product_title">
                <a href="/game/pc/into-the-breach" class="hover_none">
                            <h1>Into the Breach</h1>
                        </a>
                        <span class="platform">
                            <a href="/game/pc">
                                                    PC
                                                </a>
                        </span>
        </div>
<div class="product_data">
    <ul class="summary_details">
                        <li class="summary_detail publisher" >
                <span class="label">Publisher:</span>
                <span class="data">
                                        <a href="/company/subset-games"  >
                                                    Subset Games
                                                </a>
                                    </span>
            </li>
                                    <li class="summary_detail release_data">
                <span class="label">Release Date:</span>
                <span class="data" >Feb 27, 2018</span>
            </li>
                                                                                <li class="summary_detail product_platforms">
                        <span class="label">Also On:</span>
                        <span class="data">
                                    <a href="/game/switch/into-the-breach" class="hover_none">Switch</a>                                                </span>
                    </li>
                                                    </ul>
</div>

到目前为止,我还使用以下代码捕获了“ Also On:”文本(带有很多空格):

self.playable_on_systems_label.setText(self.html_soup.find("span", class_='platform').text.strip() + ', ' + self.html_soup.find("li", class_='summary_detail product_platforms').text.strip())

如何捕获(在这种情况下)仅“ Switch”文本?

仅供参考-对于语句的前半部分(捕获“ PC”),文本不是“也可以”文本就可以正常工作

预先感谢

2 个答案:

答案 0 :(得分:0)

您的查询将使用class="summary_detail product_platforms"获取整个span元素,该元素将包括从“ Also On:”到“ Switch”的所有文本。尝试类似.find('a', href=re.compile("^.+switch.+$"))之类的方法,或尝试使用CSS .select("a[href*=switch]") (solution from here)

答案 1 :(得分:0)

您可以使用BeautifulSoup select()函数导航“ Switch”文本,检查此代码!!!

rom bs4 import BeautifulSoup

html = '''<div class="product_title">
<a class="hover_none" href="/game/pc/into-the-breach">
<h1>Into the Breach</h1>
</a>
<span class="platform">
<a href="/game/pc">
                                                    PC
                                                </a>
</span>
</div>
<div class="product_data">
<ul class="summary_details">
<li class="summary_detail publisher">
<span class="label">Publisher:</span>
<span class="data">
<a href="/company/subset-games">
                                                    Subset Games
                                                </a>
</span>
</li>
<li class="summary_detail release_data">
<span class="label">Release Date:</span>
<span class="data">Feb 27, 2018</span>
</li>
<li class="summary_detail product_platforms">
<span class="label">Also On:</span>
<span class="data">
<a class="hover_none" href="/game/switch/into-the-breach">Switch</a> </span>
</li>
</ul>
</div>'''


soup = BeautifulSoup(html, 'html.parser')
text = soup.select('.summary_detail.product_platforms .hover_none')[0].text.strip()
print(text)

输出:

Switch