我有一个xml文件,其中有多层数据。
<?xml version="1.0" encoding="UTF-8"?>
<DeviceLog DevID="10503847" DocDate="2017-03-01T00:00:00" BSLogDate="2017-02-28T06:22:36">
<Log LogTime="2017-02-27T18:33:58">
<DevLog State="PowerOn"/>
</Log>
<Log LogTime="2017-02-28T08:59:03">
<ComponentPrivateDataLog>
<Component>1</Component>
<DataType>1</DataType>
<PrivateData>0301</PrivateData>
</ComponentPrivateDataLog>
</Log>
<Log LogTime="2017-02-28T08:59:13">
<ComponentPrivateDataLog>
<Component>1</Component>
<DataType>1</DataType>
<PrivateData>0401</PrivateData>
</ComponentPrivateDataLog>
</Log>
<Log LogTime="2017-02-28T10:16:44">
<DevLog State="StandByIn"/>
</Log>
<Log LogTime="2017-02-28T12:29:55">
<EndOfFileLog />
</Log>
</DeviceLog>
在此,每个Log
标签都是一个独立的实体,具有自己的时间属性和子节点。我正在使用minidom来解析数据。
以下是代码:
from xml.dom import minidom
xmldoc=minidom.parse("testxml.xml")
dl=xmldoc.getElementsByTagName("DeviceLog")
for d in dl:
dId=d.attributes["DevID"]
dId=dId.value
dod=d.attributes["DocDate"]
dod=dod.value
bsld=d.attributes["BSLogDate"]
bsld=bsld.value
log=xmldoc.getElementsByTagName("Log")
for l in log:
logtime = l.attributes["LogTime"]
logtime = logtime.value
devLog = l.getElementsByTagName("DevLog")
for dl in devLog:
devEvnt = dl.attributes["State"]
devEvnt = devEvnt.value
print dId,dod,bsld,logtime, devEvnt
上面的代码打印StandBy(最后一个条目)的时间和状态,而不是第一个PowerOn状态。我尝试为log=xmldoc.getElementsByTagName("Log")[0]
建立索引,对于logtime也是如此。但是没有用。
如何解析日志以便我在一个单独的行中获得每个日志的时间?
答案 0 :(得分:0)
If it helps you, use a special parser that reads your XML data into a pretty dictionary, which is a bit easier to deal with.
import xmltodict
myxml = """
...
"""
mydict = xmltodict.parse(myxml)
logs = mydict["DeviceLog"]["Log"]
for log in logs:
log_time = log["@LogTime"]
dev_log = log.get("DevLog", None)
component_log = log.get("ComponentPrivateDataLog", None)
if dev_log:
print(log_time, dev_log["@State"])
if component_log:
print(log_time, component_log["Component"], component_log["PrivateData"])
Example of such a parser: xmltodict.