我尝试解析XML文件,返回值并将其放入.csv文件中。到目前为止,我有以下代码:
for shift_i in shift_list :
# Iterates through all values in 'shift_list' for later comparison to ensure all tags are only counted once
for node in tree.xpath("//Data/Status[@Name and @Reason]"):
# Iterates through all nodes containing a 'Name' and 'Reason' attribute
state = node.attrib["Name"]
reason = node.attrib["Reason"]
end = node.attrib["End"]
start = node.attrib[u'Start']
# Sets each of the attribute values to the name of the attribute all lowercase
try:
shift = node.attrib[u'Shift']
except:
continue
# Tries to set shift attribute value to 'shift' variable, sometimes fails if no Shift attribute is present
if shift == shift_i :
# If the Shift attribute is equal to the current iteration from the 'shift_list', takes the difference of start and end and appends that value to the list with the given Name, Reason, and Shift
tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
d[state, reason, shift].append((tdelta.total_seconds()) / 60)
for node in tree.xpath("//Data/Status[not(@Reason)]"):
# Iterates through Status nodes with no Reason attribute
state = node.attrib["Name"]
end = node.attrib["End"]
start = node.attrib[u'Start']
# Sets each of the attribute values to the name of the attribute all lowercase
try:
shift = node.attrib[u'Shift']
except:
continue
# Tries to set shift attribute value to 'shift' variable, sometimes fails if no Shift
# attribute is present
if shift == shift_i:
# If the Shift attribute is equal to the current iteration from the 'shift_list',
# takes the difference of start and end and appends that value to the list with
# the given Name, "No Reason" string, and Shift
tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
d[state, 'No Reason', shift].append((tdelta.total_seconds()) / 60)
for item in d :
# Iterates through all items of d
d[item] = sum(d[item])
# Sums all values related to 'item' and replaces value in dictionary
a.update(d)
# Current keys and values in temporary dictionary 'd' to permanent
# dictionary 'a' for further analysis
d.clear()
# Clears dictionary d of current iterations keys and values to start fresh for next
# iteration. If this is not done, d[item] = sum(d[item]) returns
# "TypeError: 'float' object is not iterable"
这会创建一个字典,其值如下所示:
{('Name1','Reason','Shift'):Value,('Name2','Reason','Shift'):Value....}
print(a)返回此
defaultdict(<class 'list'>, {('Test Run', 'No Reason', 'Night'): 5.03825, ('Slow Running', 'No Reason', 'Day'): 10.72996666666667, ('Idle', 'Shift Start Up', 'Day'): 5.425433333333333, ('Idle', 'Unscheduled', 'Afternoon'): 470.0, ('Idle', 'Early Departure', 'Day'): 0.32965, ('Idle', 'Break Creep', 'Day'): 24.754250000000003, ('Idle', 'Break', 'Day'): 40.0, ('Micro Stoppage', 'No Reason', 'Day'): 39.71673333333333, ('Idle', 'Unscheduled', 'Night'): 474.96175, ('Running', 'No Reason', 'Day'): 329.4991500000004, ('Idle', 'No Reason', 'Day'): 19.544816666666666})
我想创建一个包含&#39;姓名&#39; +&#39;原因&#39;列的.csv。总计,行由&#39; Shift&#39;描述。像这样:
Name1-Reason Name2-Reason Name3-Reason Name4-Reason
Shift1 value value value value
Shift2 value value value value
Shift3 value value value value
我不确定该怎么做。我尝试使用嵌套的Dicts来更好地描述我的数据,但是在使用
时我得到了一个TypeErrord[state][reason][shift].append((tdelta.total_seconds()) / 60)
如果有更好的方法可以让我知道,我是一个非常新的人,并希望听到所有建议。
答案 0 :(得分:1)
我会使用csv包的DictWriter方法来编写csv文件。为此,您需要有一个词典列表。每个列表项都是shift
,并由带有键name
&amp;的字典表示。 reason
。它应如下所示:
[{'Name1':value1, 'Name2':value2}, {'Name1':value3, 'Name2':value4}]
答案 1 :(得分:1)
我认为以下可能会做你想要的或至少接近。您说CSV文件应该被格式化的方式忽略了一个重要的考虑因素,即每个行必须有一个Name-Reason
列,用于两者的每个可能的组合,即使在任何转换行中都没有任何特定的混合物 - 因为,这就是CSV文件格式的工作方式。
from collections import defaultdict
import csv
# Dictionary keys are (Name, Reason, Shift)
d = {('Test Run', 'No Reason', 'Night'): 5.03825,
('Slow Running', 'No Reason', 'Day'): 10.72996666666667,
('Idle', 'Shift Start Up', 'Day'): 5.425433333333333,
('Idle', 'Unscheduled', 'Afternoon'): 470.0,
('Idle', 'Early Departure', 'Day'): 0.32965,
('Idle', 'Break Creep', 'Day'): 24.754250000000003,
('Idle', 'Break', 'Day'): 40.0,
('Micro Stoppage', 'No Reason', 'Day'): 39.71673333333333,
('Idle', 'Unscheduled', 'Night'): 474.96175,
('Running', 'No Reason', 'Day'): 329.4991500000004,
('Idle', 'No Reason', 'Day'): 19.544816666666666}
# Transfer data to a defaultdict of dicts.
dd = defaultdict(dict)
for (name, reason, shift), value in d.items():
name_reason = name + '-' + reason # Merge together to form lower level keys
dd[shift][name_reason] = value
# Create a csv file from the data in the defaultdict.
ABSENT = '---' # Placeholder for empty fields
name_reasons = sorted(name_reason for shift in dd.keys()
for name_reason in dd[shift])
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerow(['Shift'] + name_reasons) # column headers row
for shift in sorted(dd):
row = [shift] + [dd[shift].get(name_reason, ABSENT)
for name_reason in name_reasons]
writer.writerow(row)
以上代码创建的dict.csv
文件的内容:
Shift,Idle-Break,Idle-Break Creep,Idle-Early Departure,Idle-No Reason,Idle-Shift Start Up,Idle-Unscheduled,Idle-Unscheduled,Micro Stoppage-No Reason,Running-No Reason,Slow Running-No Reason,Test Run-No Reason
Afternoon,---,---,---,---,---,470.0,470.0,---,---,---,---
Day,40.0,24.754250000000003,0.32965,19.544816666666666,5.425433333333333,---,---,39.71673333333333,329.4991500000004,10.72996666666667,---
Night,---,---,---,---,---,474.96175,474.96175,---,---,---,5.03825