将文本解析为XML格式

时间:2015-09-22 03:19:07

标签: python xml

我不确定这是一个JSON文件还是什么类型的数据结构,但我需要使用python语言将这些格式解析为XML。

(Table){
   classA= "false"
   classB= "150538"
   classC= "AE_T_C"
   classD= "510150 DLCX DEPOSITION"
   classE= "233344"
   classF= "516"
   classG= "150131"
   classH= "CARJOB"
   classI= "23001367"
   classJ= "960"
   classK= "16"
   classL= "true"
   classM= "P_GENERIC_HARMONY.2"
 }
 (Table){
   LetterA= "true"
   LetterB= "15"
   LetterC= "x"
   LetterD= "Carbon"
   LetterE= "44"
   LetterF= "test"
   LetterG= "Dump"
   LetterH= "NA"
   LetterI= "2"
   LetterJ= "9"
   LetterK= "1"
   LetterL= "done"
   LetterM= "test"
 }
 .
 .
 .

这是我当前解析JSON文件的脚本,但我认为它不是JSON,我仍然困惑如何解析它:

import json
import urllib
import dicttoxml

filename = 'c:/myFile'
file = open(filename,"r") 
lines = file.read() 

content = lines
obj = json.loads(content)
print(obj)

content = lines
obj = json.loads(content)
print(obj)

xml = dicttoxml.dicttoxml(obj)
print(xml)

有没有办法或建议来解析这类文件?

提前致谢..

1 个答案:

答案 0 :(得分:1)

这是一个快速的小脚本,它取决于 xmltodict 模块,该模块有助于将dict转换为xml文件:

import xmltodict
mydict = {}
# I added the contents to a file named 'afile.txt'
with open("afile.txt","r") as f:
    for line in f:
        # ignore lines containing a bracket
        if "{" not in line and "}" not in line:
            # Split string and remove whitespaces
            linesplit = line.split("=")
            mydict[linesplit[0].strip()]=linesplit[1][:-1]
# define xml root tag
root = {
    'body': mydict
}
# parse (or unparce) to xml
print xmltodict.unparse(root, pretty=True)

使用您提供的(初始)内容运行此打印:

<?xml version="1.0" encoding="utf-8"?>
<body>
    <classL> "true"</classL>
    <classM> "P_GENERIC_HARMONY.2"</classM>
    <classJ> "960"</classJ>
    <classK> "16"</classK>
    <classH> "CARJOB"</classH>
    <classI> "23001367"</classI>
    <classF> "516"</classF>
    <classG> "150131"</classG>
    <classD> "510150 DLCX DEPOSITION"</classD>
    <classE> "233344"</classE>
    <classB> "150538"</classB>
    <classC> "AE_T_C"</classC>
    <classA> "false"</classA>
</body>

这项工作是做什么的。如果您事先知道每个的内容,您还可以定义一个包含这些标签的列表,并使xml文件看起来更有条理:

# define the appropriate labels:
TableValues = ['Class', 'Letter']

# and create the dictionary based on these tags:
# this uses a dictionary comprehension in a dictionary 
# comprehension. Comprehensive stuff.
new_root = {
    'body': {
        label: {
            key: value 
            for key, value in mydict.iteritems() 
            if label.lower() in key.lower()
        } 
        for label in TableValues
     }
}    

print xmltodict.unparse(new_root, pretty=True)

使用您提供的添加内容执行此操作会产生更结构化的结果:

<?xml version="1.0" encoding="utf-8"?>
<body>
    <Class>
        <classL>"true"</classL>
        <classM>"P_GENERIC_HARMONY.2"</classM>
        <classJ>"960"</classJ>
        <classK>"16"</classK>
        <classH>"CARJOB"</classH>
        <classI>"23001367"</classI>
        <classF>"516"</classF>
        <classG>"150131"</classG>
        <classD>"510150 DLCX DEPOSITION"</classD>
        <classE>"233344"</classE>
        <classB>"150538"</classB>
        <classC>"AE_T_C"</classC>
        <classA>"false"</classA>
    </Class>
    <Letter>
        <LetterG>"Dump"</LetterG>
        <LetterF>"test"</LetterF>
        <LetterE>"44"</LetterE>
        <LetterD>"Carbon"</LetterD>
        <LetterC>"x"</LetterC>
        <LetterB>"15"</LetterB>
        <LetterA>"true"</LetterA>
        <LetterM>"test"</LetterM>
        <LetterL>"done"</LetterL>
        <LetterK>"1"</LetterK>
        <LetterJ>"9"</LetterJ>
        <LetterI>"2"</LetterI>
        <LetterH>"NA"</LetterH>
    </Letter>
</body>