Question

整个HTML代码的一部分如下所示

<td class="col2">
<a class="reserve" data-target="#myModal" data-toggle="modal"     
href="example.com" rel="nofollow"></a></td>

我使用

找到了它

soup.find_all('td', class_='col2')

但是我不想提取全部代码，而只是提取

<td class="col2"></td>

可以使用BeautifulSoup吗？我知道我可以使用字符串来做到这一点，但我很好奇。

Answer 1

您可以将string属性设置为空字符串（''）：

html = """
<td class="col2">
<a class="reserve" data-target="#myModal" data-toggle="modal"     
href="example.com" rel="nofollow"></a></td>
"""
soup= BeautifulSoup(html)
x  =soup.find_all('td', class_='col2')[0]
x.string=''
print(x)

输出

<td class="col2"></td>

编辑

documentation讲述的是这里：

如果您设置标签的.string属性，则标签内容将替换为您提供的字符串

请注意：如果该标签包含其他标签，则它们及其所有内容将被销毁。

Answer 2

您可以使用td.col2函数提取extract()内的所有元素：

data = '''
<td class="col2">
<a class="reserve" data-target="#myModal" data-toggle="modal"
href="example.com" rel="nofollow"></a></td>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

for td in soup.select('td.col2'):
    for t in td.select('*'):
        t.extract()
    print(td)

打印：

<td class="col2">
</td>

使用BeautifulSoup仅以HTML代码打印外部标签

2 个答案: