我有两个xml文件,并且有一些公共字段。我尝试阅读它们,但奇怪的是,当我更改呼叫顺序时,结果始终不一致。 第一个xml如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<ms:record xmlns:ms="http://www.example.org/ns1">
<ms:basic>
<ms:name>zhangsan</ms:name>
<ms:sex>male</ms:sex>
<ms:age>30</ms:age>
</ms:basic>
<ms:group>Dev</ms:group>
<ms:hobbies>
<ms:hobby>TV</ms:hobby>
<ms:hobby>Music</ms:hobby>
<ms:hobby>Play</ms:hobby>
</ms:hobbies>
</ms:record>
第二个xml如下:
<?xml version="1.0" encoding="UTF-8"?>
<ms:record xmlns:ms="http://www.example.org/ns2">
<ms:name>zhangsan</ms:name>
<ms:group>Dev</ms:group>
<ms:hobbies>
<ms:hobby>TV</ms:hobby>
<ms:hobby>Music</ms:hobby>
<ms:hobby>Play</ms:hobby>
</ms:hobbies>
<ms:histories>
<ms:history>
<ms:start>2012</ms:start>
<ms:job>SW Engineer</ms:job>
</ms:history>
<ms:history>
<ms:start>2015</ms:start>
<ms:job>Senior SW Engineer</ms:job>
</ms:history>
</ms:histories>
</ms:record>
代码逻辑很简单,从xml文件读取并返回dict:
from xml.etree.ElementTree import parse, fromstring
def read_xml(target_dir):
"""
read xml file
"""
try:
parser = parse(target_dir)
return parser
except Exception, ex:
print ex
def read_xml1(upload):
result = {}
NAMESPACE1 = "http://www.example.org/ns1"
parser = read_xml(upload)
info_list = [
["name", ".//ms:name"],
["sex", ".//ms:sex"],
["age", ".//ms:age"],
["hobbies", "ms:hobbies//ms:hobby"],
["group", "ms:group"],
]
function_map = {"hobbies": parser.findall}
for info in info_list:
key, value = info[0], info[1]
try:
parser_function = function_map.get(key, parser.find)
ele = parser_function(value,
{"ms": NAMESPACE1})
if ele is not None:
if type(ele) == list:
result[key] = [entry.text.strip() for entry in ele]
else:
result[key] = ele.text.strip()
except Exception, ex:
print ex
return result
def read_xml2(upload):
result = {}
NAMESPACE2 = "http://www.example.org/ns2"
parser = read_xml(upload)
history_sub = [
["start", "ms:start"],
["job", "ms:job"],
]
info_list = [
["hobbies", "ms:hobbies//ms:hobby", None],
["name", "ms:name", None],
["histories", ".//ms:history", history_sub],
["group", "ms:group", None],
]
function_map = {"hobbies": parser.findall,
"histories": parser.findall}
for info in info_list:
key, value, mapping = info[0], info[1], info[2]
try:
parser_function = function_map.get(key, parser.find)
ele = parser_function(value,
{"ms": NAMESPACE2})
if ele is not None:
if type(ele) == list:
if mapping:
sub_list = []
for entry in ele:
temp = {}
for sub_entry in mapping:
sub_key, sub_value = sub_entry[0], sub_entry[1]
sub_ele = entry.find(sub_value,
{"ms": NAMESPACE2})
temp[sub_key] = sub_ele.text.strip() if sub_ele is not None else ""
sub_list.append(temp)
result[key] = sub_list
else:
result[key] = [entry.text.strip() for entry in ele]
else:
result[key] = ele.text.strip()
except Exception, ex:
print ex
return result
当我一一称呼他们时,结果与预期的一样
if __name__ == '__main__': print read_xml1(r"C:\11.xml")
结果如下:
{'age': '30', 'hobbies': ['TV', 'Music', 'Play'], 'group': 'Dev', 'name': 'zhangsan', 'sex': 'male'}
if __name__ == '__main__': print read_xml2(r"C:\22.xml")
结果如下:
{'histories': [{'start': '2012', 'job': 'SW Engineer'}, {'start': '2015', 'job': 'Senior SW Engineer'}], 'group': 'Dev', 'name': 'zhangsan', 'hobbies': ['TV', 'Music', 'Play']}
但是当我一起调用它们时,结果很奇怪:
if __name__ == '__main__': print read_xml1(r"C:\11.xml") print read_xml2(r"C:\22.xml")
结果似乎有些冲突,11.xml中的键不在22.xml中:
{'age': '30', 'hobbies': ['TV', 'Music', 'Play'], 'group': 'Dev', 'name': 'zhangsan', 'sex': 'male'} {'histories': [{'start': '2012', 'job': 'SW Engineer'}, {'start': '2015', 'job': 'Senior SW Engineer'}], 'name': 'zhangsan', 'hobbies': []}
然后我尝试更改通话顺序,结果有所不同:
if __name__ == '__main__': print read_xml2(r"C:\22.xml") print read_xml1(r"C:\11.xml")
{'histories': [{'start': '2012', 'job': 'SW Engineer'}, {'start': '2015', 'job': 'Senior SW Engineer'}], 'group': 'Dev', 'name': 'zhangsan', 'hobbies': ['TV', 'Music', 'Play']} {'age': '30', 'hobbies': [], 'name': 'zhangsan', 'sex': 'male'}
我不知道该怎么解释。看起来有些东西是共享和重写的。