如何用字符串替换lxml中的元素

时间:2018-07-12 12:57:28

标签: python lxml

我试图在lxml和python中弄清楚如何用字符串替换元素。

在实验中,我有以下代码:

from lxml import etree as et

docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'

topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot) 
xref = topicroot2.xpath('//*/xref')
xref_attribute = xref[0].attrib['browsertext']

print href_attribute

结果是:“这里有东西”

这是我在此小样本中正在寻找的浏览器文本属性。但是我似乎无法弄清楚的是如何用我在这里捕获的属性文本替换整个元素。

(我确实意识到我的示例中可能有多个外部参照,因此需要构造一个循环以正确地通过它们。)

执行此操作的最佳方法是什么?

对于那些想知道的人,我必须这样做是因为链接实际上会转到由于我们的构建系统不同而不存在的文件。

谢谢!

1 个答案:

答案 0 :(得分:1)

尝试一下(Python 3):

import tensorflow as tf

输出:

from lxml import etree as et

docstring = '<p>The value is permitted only when that includes <xref linkend=\"my linkend\" browsertext=\"something here\" filename=\"A_link.fm\"/>, otherwise the value is reserved.</p>'

# Get the root element.
topicroot = et.XML(docstring)
topicroot2 = et.ElementTree(topicroot)

# Get the text of the root element. This is a list of strings!
topicroot2_text = topicroot2.xpath("text()")

# Get the xref elment.
xref = topicroot2.xpath('//*/xref')[0]
xref_attribute = xref.attrib['browsertext']

# Save a reference to the p element, remove the xref from it.
parent = xref.getparent()
parent.remove(xref)

# Set the text of the p element by combining the list of string with the
# extracted attribute value.
new_text = [topicroot2_text[0], xref_attribute, topicroot2_text[1]]
parent.text = "".join(new_text)

print(et.tostring(topicroot2))