I have this text file 20150731100543_1.txt
GI-eSTB-MIB-NPH::eSTBGeneralErrorCode.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBGeneralConnectedState.0 = INTEGER: true(1)
GI-eSTB-MIB-NPH::eSTBGeneralPlatformID.0 = INTEGER: 2075
GI-eSTB-MIB-NPH::eSTBMoCAfrequency.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBMoCAMACAddress.0 = STRING: 0:0:0:0:0:0
GI-eSTB-MIB-NPH::eSTBMoCANumberOfNodes.0 = INTEGER: 0
Which I want to convert in xml like below (20150731100543_1.xml)
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<GI-eSTB-MIB-NPH>
<eSTBGeneralErrorCode.0>
INTEGER: 0
</eSTBGeneralErrorCode.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBGeneralConnectedState.0>
INTEGER: true(1)
</eSTBGeneralConnectedState.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBGeneralPlatformID.0>
INTEGER: 2075
</eSTBGeneralPlatformID.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCAfrequency.0>
INTEGER: 0
</eSTBMoCAfrequency.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCAMACAddress.0>
STRING: 0:0:0:0:0:0
</eSTBMoCAMACAddress.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCANumberOfNodes.0>
INTEGER: 0
</eSTBMoCANumberOfNodes.0>
</GI-eSTB-MIB-NPH>
</doc>
I am able get this done using following script:
import sys
import time
import commands
from xml.etree.ElementTree import Element, SubElement
from xml.etree import ElementTree
from xml.dom import minidom
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ", newl="\n", encoding="UTF-8")
if len(sys.argv) != 2:
print "\nUsage: python script.py <IP>\n";
exit(0)
filename_xml = '20150731100543_1.xml'#filename_xml = temp + ".xml"
print "xml filename is: %s\n" % filename_xml
xml = open(filename_xml, 'w+')
top = Element('doc')
with open('20150731100543_1.txt') as f:
for line in f:
b = line.split(':')
child = SubElement(top, b[0])
c = line.split()
d = c[0].split(':')
property = SubElement(child, d[2])
property.text = c[2] + " " + c[3]
xml.write(prettify(top))
xml.close()
I have three questions here:
So if possible the format of xml should be like:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<GI-eSTB-MIB-NPH>
<eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
<eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
<eSTBGeneralPlatformID.0>INTEGER: 2075</eSTBGeneralPlatformID.0>
<eSTBMoCAfrequency.0>INTEGER: 0</eSTBMoCAfrequency.0>
<eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
<eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
</GI-eSTB-MIB-NPH>
</doc>
I am trying for this as this will reduce the number of lines in xml to great extent.
The last and least important question is:
Is there any better way to get the substrings of each line than how I have done it using split()
with open('20150731100543_1.txt') as f: for line in f: b = line.split(':') child = SubElement(top, b[0])
c = line.split()
d = c[0].split(':')
property = SubElement(child, d[2])
property.text = c[2] + " " + c[3]
Please forgive me for such lengthy post.
答案 0 :(得分:1)
1&amp; 2:我使用etree.tostring,我没有任何这些问题。
3:可以使用正则表达式替换多个拆分操作。
这应该可以正常工作:
from lxml import etree
import re
filename_xml = '20150731100543_1.xml'
root = etree.Element('doc')
node = etree.SubElement(root, 'GI-eSTB-MIB-NPH')
f = open('20150731100543_1.txt')
text = f.read()
f.close()
# get tag and value from each row
for tag, value in re.findall('GI-eSTB-MIB-NPH::(.*) = (.*$)', text, re.MULTILINE):
# create child node
etree.SubElement(node, tag).text = value
xml = etree.tostring(root, pretty_print = True, encoding = 'utf-8', xml_declaration=True)
f = open(filename_xml, 'w')
f.write(xml)
f.close