我要替换内部的所有节点文本而不更改其结构。 以下代码仅更改了标记文本,而未在文本节点中更改(itertext()不够,因为它仅返回“字符串”)。我找到了一种方法here,但是检测所有结构有些麻烦。
def replace_text(tree): # tree: Element from lxml
for tag in tree.iter():
if not len(tag):
if tag.text is not None:
tag.text = 'z1'
else:
pass
发件人:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div>
a1
<div>
a2
<div>
a3
<p>
a4
a5
</p>
<p>
a6
a7
<br>
<span>a8</span>
</p>
</div>
</div>
</div>
</body>
</html>
期望:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div>
z1
<div>
z2
<div>
z3
<p>
z4
z5
</p>
<p>
z6
z7
<br>
<span>z8</span>
</p>
</div>
</div>
</div>
</body>
</html>