Question

我正在制作一个脚本来重构模板增强的html。

我希望beautifulsoup可以逐字打印{{ }}中的任何代码，而无需将>或其他符号转换为html实体。

需要将包含许多模板的文件拆分为多个文件，每个文件都有一个模板。

规范：splitTemplates templates.html templateDir必须：

阅读templates.html
为每个<template name="xxx">contents {{ > directive }}</template>，将此模板写入文件templateDir/xxx

代码段：

soup = bs4.BeautifulSoup(open(filename,"r"))
for t in soup.children:
    if t.name=="template":
        newfname = dirname+"/"+t["name"]+ext
        f = open(newfname,"w")
        # note f.write(t) fails as t is not a string -- prettify works
        f.write(t.prettify())
        f.close()

不受欢迎的行为：

{{ > directive }}变为{{ > directive }}但需要保留为{{ > directive }}

更新：f.write(t.prettify(formatter=None))有时保留>，有时会将其更改为>。这似乎是最明显的变化，不确定为什么它会改变一些>而不改变其他import HTMLParser U = HTMLParser.HTMLParser().unescape soup = bs4.BeautifulSoup(open(filename,"r")) for t in soup.children: if t.name=="template": newfname = dirname+"/"+t["name"]+ext f = open(newfname,"w") f.write(U(t.prettify(formatter=None))) f.close()。

解决：感谢Hai Vu和https://stackoverflow.com/a/663128/103081

{{1}}

另请参阅：https://gist.github.com/DrPaulBrewer/9104465

Answer 1

您需要取消HTML内容。为此，请使用HTMLParser模块。这不是我最初的想法。我把它从：

How can I change '>' to '>' and '>' to '>'?

用把手做美丽的汤玩得很好

1 个答案: