Question

I have this text file 20150731100543_1.txt

GI-eSTB-MIB-NPH::eSTBGeneralErrorCode.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBGeneralConnectedState.0 = INTEGER: true(1)
GI-eSTB-MIB-NPH::eSTBGeneralPlatformID.0 = INTEGER: 2075
GI-eSTB-MIB-NPH::eSTBMoCAfrequency.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBMoCAMACAddress.0 = STRING: 0:0:0:0:0:0
GI-eSTB-MIB-NPH::eSTBMoCANumberOfNodes.0 = INTEGER: 0

Which I want to convert in xml like below (20150731100543_1.xml)

<?xml version="1.0" encoding="UTF-8"?>
<doc>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralErrorCode.0>
            INTEGER: 0
        </eSTBGeneralErrorCode.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralConnectedState.0>
            INTEGER: true(1)
        </eSTBGeneralConnectedState.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralPlatformID.0>
            INTEGER: 2075
        </eSTBGeneralPlatformID.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCAfrequency.0>
            INTEGER: 0
        </eSTBMoCAfrequency.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCAMACAddress.0>
            STRING: 0:0:0:0:0:0
        </eSTBMoCAMACAddress.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCANumberOfNodes.0>
            INTEGER: 0
        </eSTBMoCANumberOfNodes.0>
    </GI-eSTB-MIB-NPH>
</doc>

I am able get this done using following script:

import sys
import time
import commands
from xml.etree.ElementTree import Element, SubElement
from xml.etree import ElementTree
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="    ", newl="\n", encoding="UTF-8")

if len(sys.argv) != 2:
    print "\nUsage: python script.py <IP>\n";
    exit(0)
filename_xml = '20150731100543_1.xml'#filename_xml = temp + ".xml"
print "xml filename is: %s\n" % filename_xml
xml = open(filename_xml, 'w+')

top = Element('doc')

with open('20150731100543_1.txt') as f:
    for line in f:
        b = line.split(':')
        child = SubElement(top, b[0])

        c = line.split()
        d = c[0].split(':')
        property =  SubElement(child, d[2])

        property.text = c[2] + " " + c[3]

xml.write(prettify(top))

xml.close()

I have three questions here:

Is there any way (using toprettyxml() or something else) I can change the xml that is being generated to have openeing and closing tags and text in that tag in same line?
Also can I have tag only at starting an at the end instead of having it with every element below it? (as all the elements are within this same tag)

So if possible the format of xml should be like:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
        <eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
        <eSTBGeneralPlatformID.0>INTEGER: 2075</eSTBGeneralPlatformID.0>
        <eSTBMoCAfrequency.0>INTEGER: 0</eSTBMoCAfrequency.0>
        <eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
        <eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
    </GI-eSTB-MIB-NPH>
</doc>

I am trying for this as this will reduce the number of lines in xml to great extent.

The last and least important question is:

Is there any better way to get the substrings of each line than how I have done it using split()

with open('20150731100543_1.txt') as f: for line in f: b = line.split(':') child = SubElement(top, b[0])
```
    c = line.split()
    d = c[0].split(':')
    property =  SubElement(child, d[2])

    property.text = c[2] + " " + c[3]
```

Please forgive me for such lengthy post.

Answer 1

1＆amp; 2：我使用etree.tostring，我没有任何这些问题。

3：可以使用正则表达式替换多个拆分操作。

这应该可以正常工作：

from lxml import etree
import re

filename_xml = '20150731100543_1.xml'

root = etree.Element('doc')
node = etree.SubElement(root, 'GI-eSTB-MIB-NPH')
f = open('20150731100543_1.txt')
text = f.read()
f.close()

# get tag and value from each row
for tag, value in re.findall('GI-eSTB-MIB-NPH::(.*) = (.*$)', text, re.MULTILINE):
   # create child node
   etree.SubElement(node, tag).text = value

xml = etree.tostring(root, pretty_print = True, encoding = 'utf-8', xml_declaration=True)

f = open(filename_xml, 'w')
f.write(xml)
f.close

How to use toprettyxml() to give xml tag and text in same line

1 个答案: