是否可以将标记设置为标记内容(类似于在JavaScript中设置innerHtml
)?
为了举例,我们假设我要向<a>
添加10个<div>
元素,但用逗号分隔它们:
soup = BeautifulSoup(<<some document here>>)
a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings
div = soup.new_tag("div")
a_str = ",".join(a_tags)
使用div.append(a_str)
将<
和>
转义为<
和>
,最后我
<div> <a1> 1 </a> ... </div>
BeautifulSoup(a_str)
将此内容包含在<html>
中,我认为将树从其中删除是一种不雅的黑客。
怎么办?
答案 0 :(得分:6)
您需要在包含链接的BeautifulSoup
字符串中创建HTML
对象:
from bs4 import BeautifulSoup
soup = BeautifulSoup()
div = soup.new_tag('div')
a_tags = ["<a>1</a>", "<a>2</a>", "<a>3</a>", "<a>4</a>", "<a>5</a>"]
a_str = ",".join(a_tags)
div.append(BeautifulSoup(a_str, 'html.parser'))
soup.append(div)
print soup
打印:
<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>
替代解决方案:
为每个链接创建一个Tag
并将其附加到div
。另外,在除了last:
from bs4 import BeautifulSoup
soup = BeautifulSoup()
div = soup.new_tag('div')
for x in xrange(1, 6):
link = soup.new_tag('a')
link.string = str(x)
div.append(link)
# do not append comma after the last element
if x != 6:
div.append(",")
soup.append(div)
print soup
打印:
<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>