我有一个不清楚的xml并使用python lxml模块处理它。我希望在处理之前用\n
替换内容中的所有space
,如何为所有元素的文本执行此操作。
修改 我的xml示例:
<root>
<a> dsdfs\n dsf\n sdf\n</a>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
....
....
....
....
</root>
当我打印ittertext时,我不想在输出中得到它:
root = #get root element
for i in root.ittertext():
print i
dsdfs dsf sdf
dsdfs dsf sdf
sdf nsdf sdf
答案 0 :(得分:1)
下面的代码会将xml解析为字符串,然后用\n
替换space
,然后写入新的xml文件。您可以在两者之间进行其他处理,具体取决于您想要做什么。
from lxml import etree
tree = etree.parse('some.xml')
root = tree.getroot()
# Get the whole XML content as string
xml_in_str = etree.tostring(root)
# Replace all \n with space
new_xml_data = xml_in_str.replace(r'\n', ' ')
# Do the processing with the new_xml_data string which is formatted
# Maybe also write to a new XML file, without the \n
with open('newxml.xml', 'w') as f:
f.write(new_xml_data)
some.xml
看起来像:
<root>
<a> dsdfs\n dsf\n sdf\n</a>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
<bds>
<d>sdf\n\n\n\n\n\n</d>
<d>sdf\n\n\nsdf\nsdf\n\n</d>
</bds>
</root>
newxml.xml
看起来像:
<root>
<a> dsdfs dsf sdf </a>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
<bds>
<d>sdf </d>
<d>sdf sdf sdf </d>
</bds>
</root>
答案 1 :(得分:-1)
您尝试过的代码究竟是什么?字符串对于初学者来说是不可变的,并且没有&#34; replaceall&#34; Python中的方法
for i in root_elem.itertext():
j = i.replace('\n',' ')
print(j+'\n') # or some fp.write call to a new file