Question

以下是我要解析的HTML代码示例：

<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>

我使用漂亮的汤来解析HTML代码，方法是选择style8，如下所示（其中html读取我的http请求的结果）：

html = result.read()
soup = BeautifulSoup(html)

content = soup.select('.style8')

在此示例中，content变量返回4个标签的列表。我想检查列表中每个项目的content.text，其中包含每个style8类的文本，如果它包含Example并将其附加到变量。如果它在整个列表中继续，并且列表中没有Example，则会将Not present附加到变量。

到目前为止我有以下内容：

foo = []

for i, tag in enumerate(content):
    if content[i].text == 'Example':
        foo.append('Example')
        break
    else:
        continue

如果Example发生foo，它只会附加Not Present，但如果整个列表中没有{{1}}，则不会附加{{1}}。

这样做的任何方法都很受欢迎，或者搜索整个结果以检查字符串是否存在的更好方法会很棒

Answer 1

您可以使用find_all()查找td的所有class='style8'元素，并使用列表理解构建foo列表：

from bs4 import BeautifulSoup


html = """<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""

soup = BeautifulSoup(html)

foo = ["Example" if "Example" in node.text else "Not Present" 
       for node in soup.find_all('td', {'class': 'style8'})]
print foo

打印：

['Example', 'Not Present', 'Not Present', 'Not Present']

Answer 2

如果您只是想检查是否找到它，可以使用一个简单的布尔标志如下：

foo = []
found = False
for i, tag in enumerate(content):
    if content[i].text == 'Example':
        found = True
        foo.append('Example')
        break
    else:
        continue
if not found:
    foo.append('Not Example')

如果我得到你想要的东西，这可能是一个简单的方法，虽然alecxe的解决方案看起来很棒。

Python美丽的汤选择文本

2 个答案: