Question

当我尝试将以下HTML插入元素时

<div class="frontpageclass"><h3 id="feature_title">The Title</h3>... </div>

bs4像这样替换它：

<div class="frontpageclass">&lt;h3 id="feature_title"&gt;The Title &lt;/h3&gt;... &lt;div&gt;</div>

我正在使用string，但它仍然弄乱了格式。

with open(html_frontpage) as fp:
   soup = BeautifulSoup(fp,"html.parser")

found_data = soup.find(class_= 'front-page__feature-image')
found_data.string = databasedata

如果我尝试使用found_data.string.replace_with，则会收到NoneType错误。 found_data属于标签类型。

similar issue but they are using div, not class

Answer 1

设置元素.text或.string会使该值经过HTML编码，这是正确的做法。这样可以确保在浏览器中显示文档时，您插入的文本将以1：1出现。

如果要插入 actual HTML，则需要在树中插入新节点。

from bs4 import BeautifulSoup

# always define a file encoding when working with text files
with open(html_frontpage, encoding='utf8') as fp:
    soup = BeautifulSoup(fp, "html.parser")

target = soup.find(class_= 'front-page__feature-image')

# empty out the target element if needed
target.clear()

# create a temporary document from your HTML
content = '<div class="frontpageclass"><h3 id="feature_title">The Title</h3>...</div>'
temp = BeautifulSoup(content)

# the nodes we want to insert are children of the <body> in `temp`
nodes_to_insert = temp.find('body').children

# insert them, in source order
for i, node in enumerate(nodes_to_insert):
    target.insert(i, node)

Answer 2

对于混乱格式，仅＆lt; ＆gt;对应于“ <”和“>”。只需更换所有的灯罩即可。

例如假设beautifulsoup将html标记插入具有混乱格式的soup1变量中： a = str（soup1）.replace（＆lt;，'<'）。replace（＆gt;，'>'）;打印（a）

在真实代码中，应放置＆lt;在“”内部，并且两者之间没有空格。（此处，网页显示＆lt;空格与<相同）

因此变量a应该使用正确的格式。

使用BeautifulSoup将HTML插入元素

2 个答案: