Question

如果下一个标记包含“Utility”文本，我试图阻止BeautifulSoup添加换行符。

<html>
    <dl>
        <dt>RandomText</dt>  <!-- Line Break -->
        <dt>RandomText</dt>  <!-- Don't insert Line Break -->
        <dt>Utility: NonStaticText</dt>  <!-- Line Break  -->
    </dl>
</html>

现在我有：

soup.unwrap('head')

for dt in soup.findAll('dt'):
    dt.insert_after('\n')

这是非常小的，但我该怎么做呢？文本“Utility：”经常出现，但“Utility：”之后的内容在每种情况下都不同，并且包含在标记内。我正在使用BS4。

更新：

我发现：

for dt in soup.find_all('dt'):
    if not dt.find(string = re.compile('Utility')):
        dt.insert_before('\n')

似乎有点奏效。我真正需要的是评估树中的下一个标签并评估它是否具有字符串'Utility'，并根据该决定做出决定。理想的......

dt.insert_before('n')

应该是：

dt.insert_after('n')

更新2：

这是我的解决方案：

for dt in soup.find_all('dt'):
    next_tag = dt.find_next('dt')

    try:  # THROWS 'AttributeError' IF NOT FOUND ...
        if not next_tag.text.startswith('Utility'):
            dt.insert_after('\n')

    except AttributeError as e:
        pass

Answer 1

您可以使用find_next方法获取下一个标记，例如：

for dt in soup.find_all('dt'):
    next_tag = dt.find_next()
    if not next_tag.text.startswith('Utility:'): 
        dt.insert_after('\n')

请注意，如果您未在find_next中传递任何参数，则它将匹配后面的任何标记。

如果下一个标记包含Beautifulsoup4中的文本，则跳过添加换行符

1 个答案: