Question

在BeautifulSoup4中，如何搜索带有包含特定字符串的文本的标签？例如，当搜索“天际”时，我要打印每个包含字符串“天际”的标签的内容（例如游戏名称）。

我尝试使用

    if 'skyrim' in tag.string:

但是它从不打印任何东西。

完整定义：

def search(self):
    steam_results = self.soup.find_all('span', class_='title')

    itr = 1
    for tag in steam_results:
        if self.title in tag.string:  # <--- Not working
            print(str(itr) + ': ' + tag.string + '\n')
            itr = itr + 1

steam_results的示例：

>>> steam_results
[<span class="title">The Elder Scrolls V: Skyrim Special Edition</span>,
 <span class="title">Skyrim Script Extender (SKSE)</span>, 
 <span class="title">Enderal</span>, ...]

预期结果：

上古卷轴V：天际特别版
天际脚本扩展器（SKSE）

实际结果：不打印任何内容

Answer 1

问题是子字符串检查，因为它是>>> D= {'food':'rice', 'quantity':5} >>> D['food' ].upper() 'RICE' >>> D {'food':'rice', 'quantity':5}。如果使用case-sensitive进行检查，则会得到空结果，因为没有skyrim包含title而不是skyrim。因此，将其与像这样的小写字母进行比较，

Skyrim

输出：

steam_results = soup.find_all('span', class_='title')
for steam in steam_results:
    if 'skyrim' in steam.getText().lower():
        print(steam.getText())

Answer 2

您可以使用soup.find_all(string=re.compile("your_string_here")获取文本，然后使用.parent获取标签。

from bs4 import BeautifulSoup
import re
html="""
<p id="1">Hi there</p>
<p id="2">hello<p>
<p id="2">hello there<p>
"""
soup=BeautifulSoup(html,'html.parser')
print([tag.parent for tag in soup.find_all(string=re.compile("there"))])

输出

[<p id="1">Hi there</p>, <p id="2">hello there<p>\n</p></p>]

如何在BS4中搜索包含给定字符串的标签？

2 个答案: