Question

我有一个python方法（thank to this snippet），它使用BeautifulSoup和Django的urlize来获取一些html并将<a>标签包裹在仅未格式化的链接上：

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(urlizedText)

    print(soup)

    return str(soup)

示例输入文本（由第一个print语句输出）是：

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: http://google.ca

结果返回文本（由第二个print语句输出）是：

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: &lt;a href="http://google.ca"&gt;http://google.ca&lt;/a&gt;

正如您所看到的，它正在格式化链接，但它使用转义的html进行格式化，因此当我在模板{{ my.html|safe }}中打印它时，它不会呈现为html。< / p>

那么我怎样才能将这些添加了urlize的标签转义为未转义，并正确呈现为html？我怀疑这与我使用它作为方法而不是模板过滤器有关吗？我无法真正找到有关此方法的文档，但它不会出现在django.utils.html中。

编辑：似乎逃避实际发生在这一行：textNode.replaceWith(urlizedText)。

Answer 1

你可以将你的urlizedText字符串转换为一个新的BeautifulSoup对象，它将被视为一个标签，而不是一个文本（它可以像你期望的那样被转义）

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(BeautifulSoup(urlizedText, "html.parser"))

    print(soup)

    return str(soup)

Answer 2

这似乎是您尝试使用BeautifulSoup将文本节点替换为包含HTML实体的文本节点的地方。

实现目标的一种方法是使用urlize的输出构建一个新字符串（似乎不关心链接是否已经格式化）。

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    finalFragments = []
    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if getattr(textNode.parent, 'name') == 'a':
            finalFragments.append(str(textNode.parent))
        else:
            finalFragments.append(urlize(textNode))

    return str("".join(finalFragments))

但是，如果您只想在模板中渲染它，则只需将输入字符串上的urlize作为模板标记调用 -

{{input_string|urlize}}

BeautifulSoup replaceWith（）方法添加转义的html，希望它不转义

2 个答案: