我对此非常陌生,而且我已经尝试过搜索,但我发现的任何内容都无法为我工作。
我有xml数据,看起来像这样
<datainfo>
<data>
<info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
<info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
<moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
<moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
</info>
<info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
...
</data>
</datainfo>
我希望在特定日期找到状态{1,2,...}花费了多少时间{x,y,...}并将该打印件以.csv格式打印到后面读取在excel中。
我遇到的问题是我无法使用静态变量,因为数百种不同的状态有数百种不同的原因,并且它们会不断变化。
如果我不清楚,请告诉我,我是新手,非常感谢所有人的帮助。
编辑:这是我目前所拥有的,希望这将清除我想要做的事情。
from datetime import datetime
from lxml import etree as ET
def parseXML(file):
handler = open(file, "r")
tree = ET.parse(handler)
info_list = tree.xpath('//info')
root = tree.getroot()
dictionary = {}
info_len = len(info_list)
for i in range(info_len):
info=root[0][0][i]
info_attribs = info.attrib
end = info_attribs[u'End']
start = info_attribs[u'Start']
FMT = '%m/%d/%Y %H:%M:%S.%f'
tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
t_dif = (tdelta.total_seconds()) / 60
try:
dictionary[info_attribs[u'State'] + status_attribs[u'Reason']] = t_dif
except:
continue
我试图遍历每一行,找到状态和原因,然后将它们添加到字典中。如果该状态和原因的条目已存在,我想将其添加到当前值。
如果我应该提供更多信息,请告诉我!
编辑#2:
我正在寻找的输出将是.csv的形式,结构如下:
State - Reason, [Total time spent in State 1 for x reason]
答案 0 :(得分:3)
您可以使用 defaultdict 使用列表作为值来重复出现密钥,您也可以使用 xpath 过滤信息节点,以便仅找到同时具有这两者的节点你想要的属性除了以外不需要:
x = """<datainfo>
<data>
<info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
<info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
<moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
<moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
</info>
<info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
</data>
</datainfo>"""
from collections import defaultdict
import lxml.etree as et
from datetime import datetime
FMT = '%m/%d/%Y %H:%M:%S.%f'
tree = et.fromstring(x)
d = defaultdict(list)
for node in tree.xpath("//data/info[@Reason and @State]"):
state = node.attrib["State"]
reason = node.attrib["Reason"]
end = node.attrib["End"]
start = node.attrib[u'Start']
tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
d[state, reason].append((tdelta.total_seconds()) / 60))
print(d)
根据您希望数据查找重复键的方式将决定您如何写入csv,如果您想要每行一行:
import csv
with open("out.csv", "w") as f:
wr = csv.writer(f)
for k,v in d.items():
for val in v:
wr.writerow([k] + val)
如果你真的想总结:
d = defaultdict(float)
for node in tree.xpath("//data/info[@Reason and @State]"):
state = node.attrib["State"]
reason = node.attrib["Reason"]
end = node.attrib["End"]
start = node.attrib[u'Start']
tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
d[state, reason] += (tdelta.total_seconds()) / 60
然后:
import csv
with open("out.csv", "w") as f:
wr = csv.writer(f)
wr.writerows(d.items())
答案 1 :(得分:0)
这假设您已将xml解析为数组数组
import csv
# This is assuming you have your xml parsed into an array of arrays [['state', 'reason'], ['state', 'reason']]
# example of array format
data = [['1', 'x'], ['1', 'y'], ['2', 'z']]
with open("output.csv", "w") as f:
writer = csv.writer(f)
writer.writerows(data)