[Python 3.4] [Windows 7]
如果有一种简单的方法可以将整个.xml文件(如.txt)作为一个字符串获取,那就足够了,但要准确描述问题:
这是我第一次处理.xml文件。 我有一个.xml文件,主要包含字典(进一步的字典)。 它还说 现在,我希望从字典中获取非常确定的键和值,并将它们写在.txt文件中,因此在python中使用dict(或者其他)就足够了。
举个例子:
这是xml文件(library.xml):
<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
<key>Version<\key><integer>1</integer>
<key>Tracks</key>
<dict>
<key>0001</key>
<dict>
<key>Name</key><string>spam</string>
<key>Detail</key><string>spam spam</string>
</dict>
<key>0002</key>
<dict>
<key>Name</key><string>ham</string>
<key>Detail</key><string>ham ham</string>
</dict>
</dict>
</dict>
</plist>
我研究并认为我可以使用xml.etree.ElementTree模块: 所以如果我试试这个:
tree = ET.parse('library.xml')
root = tree.getroot()
我只收到此消息:
(Unicode错误)'unicodeescape'编解码器无法解码字节...
我想要的显然是某种形式(或作为一个字典,它无关紧要)
[['Name: spam', 'Detail: spam spam'], ['Name: ham', 'Detail: ham ham']
编辑:xml代码不正确,sry 编辑:添加了最后一段
答案 0 :(得分:1)
Python标准库包含一个读取plist文件的模块:plistlib
。您可以使用它来通过导入和一个命令来解决您的问题:
import plistlib
print plistlib.readPlist('library.xml')
输出:
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
答案 1 :(得分:0)
将输入内容从<\key>
更新为</key>
并删除了dict
代码,因为没有为此定义密钥。
lxml.html
模块解析XML数据。dict
方法获取目标主xpath()
标记。XMLtoDict()
功能。getchildren()
方法和for
循环对输入标记的子项进行迭代。if
循环检查标记名称是否为关键。getnext()
方法获取当前标记的下一个标记。integer
标记,则获取值类型int
。string
标记,则值类型为string
。dict
标记,则值类型为dict
并再次调用函数,即递归调用。代码:
data = """<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
<key>Version</key>
<integer>1</integer>
<key>Tracks</key>
<dict>
<key>0001</key>
<dict>
<key>Name</key><string>spam</string>
<key>Detail</key><string>spam spam</string>
</dict>
<key>0002</key>
<dict>
<key>Name</key><string>ham</string>
<key>Detail</key><string>ham ham</string>
</dict>
</dict>
</dict>
</plist>
"""
def XMLtoDict(root):
result = {}
for i in root.getchildren():
if i.tag=="key":
key = i.text
next_tag = i.getnext()
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoDict(next_tag)
else:
value = None
result[key] = value
return dict(result)
import lxml.html as ET
import pprint
root = ET.fromstring(data)
result = XMLtoDict(root.xpath("//plist/dict")[0])
pprint.pprint(result)
输出:
vivek@vivek:~/Desktop/stackoverflow$ python 12.py
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
我没有得到这样的例外。
(Unicode错误)'unicodeescape'编解码器无法解码字节...
在library.xml中标记不正确
将xml.etree.ElementTree导入为ET tree = ET.parse('library.xml')
获取以下输入异常
vivek@vivek:~/Desktop/stackoverflow$ python 12.py
Traceback (most recent call last):
File "12.py", line 46, in <module>
tree = ET.parse('library.xml')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4, column 15
由于标记无效而导致此异常。要修复此异常,请执行以下操作:
从<key>Version<\key>
更改为<key>Version</key>
xml.etree.ElementTree
模块:代码:
def XMLtoDict(root):
result = {}
chidren_tags = root.getchildren()
for j, i in enumerate(chidren_tags):
if i.tag=="key":
key = i.text
next_tag = chidren_tags[j+1]
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoDict(next_tag)
else:
value = None
result[key] = value
return dict(result)
def XMLtoList(root):
result = []
chidren_tags = root.getchildren()
for j, i in enumerate(chidren_tags):
if i.tag=="key":
key = i.text
next_tag = chidren_tags[j+1]
next_tag_name = next_tag.tag
if next_tag_name=="integer":
value = int(next_tag.text)
elif next_tag_name=='string':
value = next_tag.text
elif next_tag_name=='dict':
value = XMLtoList(next_tag)
else:
value = None
result.append([key, value])
return list(result)
import xml.etree.ElementTree as ET
import pprint
tree = ET.parse('library.xml')
root = tree.getroot()
dict_tag = root.find("dict")
if dict_tag is not None:
result = XMLtoDict(dict_tag)
print "Result in Dictinary:-"
pprint.pprint(result)
result = XMLtoList(dict_tag)
print "\nResult in Dictinary:-"
pprint.pprint(result)
输出: vivek @ vivek:〜/ Desktop / stackoverflow $ python 12.py
Result in Dictinary:-
{'Tracks': {'0001': {'Detail': 'spam spam', 'Name': 'spam'},
'0002': {'Detail': 'ham ham', 'Name': 'ham'}},
'Version': 1}
Result in Dictinary:-
[['Version', 1],
['Tracks',
[['0001', [['Name', 'spam'], ['Detail', 'spam spam']]],
['0002', [['Name', 'ham'], ['Detail', 'ham ham']]]]]]
答案 2 :(得分:0)
我只是想让你知道我刚刚解决了这个问题:
with open('library.xml',
'r', encoding='UTF-8') as file:
(和一些正则表达式来得到我想要的词汇)
这可能非常低效,因为我将整个文件作为文本阅读但实际上我并不关心效率,因为该函数在我的程序中只有一个调用;)