标题是自我解释的,在将其标记为重复之前,请考虑我已经检查this answer并且它对我不起作用,因为我甚至没有在sys.stdout中获得正确的格式提交。所以我有以下xml( test.xml ):
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
</DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
以下代码:
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse("test.xml", parser)
def get_data_fields():
for node in tree.iter():
if 'DataFields' in node.tag:
return node
a = get_data_fields()
field = etree.Element('Field_1')
child_1 = etree.Element('FieldName')
child_2 = etree.Element('FieldValue')
child_3 = etree.Element('FieldIndex')
child_1.text = 'dateTime'
child_2.text = '2016-07-29T12:00:00'
child_3.text = '1'
for i in [child_1, child_2, child_3]:
field.append(i)
a.append(field)
s = etree.tostring(tree, pretty_print=True)
print(s.decode('utf-8'))
输出
<soap:Envelope xmlns:soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1><FieldName>dateTime</FieldName><FieldValue>2016-07-29T12:00:00</FieldValue><FieldIndex>1</FieldIndex></Field_1></DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
预期
<soap:Envelope xmlns:soap="http://www...">
<soap:Body>
<SubmitTransaction xmlns="http://www.">
<Authentication>
</Authentication>
<Transaction>
<DataFields>
<Field_1>
<FieldName>dateTime</FieldName>
<FieldValue>2016-07-29T12:00:00</FieldValue>
<FieldIndex>1</FieldIndex>
</Field_1>
</DataFields>
</Transaction>
</SubmitTransaction>
</soap:Body>
</soap:Envelope>
我真的不明白为什么我添加的新字段没有格式化,因为如果我只打印field
,一切看起来都很好:
s = etree.tostring(root, pretty_print=True)
print(s.decode('utf-8'))
#<Field_1 xmlns="http://www." xmlns:soap="http://www...">
# <FieldName>dateTime</FieldName>
# <FieldValue>2016-07-29T12:00:00</FieldValue>
# <FieldIndex>1</FieldIndex>
#</Field_1>
注意:我正在使用python 3.4(这就是我必须.decode('utf-8')
的原因,否则我只是得到字节文字。)
答案 0 :(得分:2)
如果您在a = get_data_fields()
之后添加此行,则可以使用
a.text = None
lxml无法始终确定哪些空格是可忽略的,因此在某些情况下需要手动删除空白。
请参阅http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output:
如果你想确保从XML文档中删除所有空白文本(或者只是解析器本身的空白文本),你必须使用DTD告诉解析器它可以安全地忽略哪个空格,解析后手动删除可忽略的空格,例如通过将所有尾部文本设置为无: