Question

我尝试解析XML文件，返回值并将其放入.csv文件中。到目前为止，我有以下代码：

for shift_i in shift_list :

    # Iterates through all values in 'shift_list' for later comparison to ensure all tags are only counted once
    for node in tree.xpath("//Data/Status[@Name and @Reason]"):
    # Iterates through all nodes containing a 'Name' and 'Reason' attribute
        state = node.attrib["Name"]
        reason = node.attrib["Reason"]
        end = node.attrib["End"]
        start = node.attrib[u'Start']
        # Sets each of the attribute values to the name of the attribute all lowercase
        try:
        shift = node.attrib[u'Shift']
        except:
            continue
        # Tries to set shift attribute value to 'shift' variable, sometimes fails if no Shift attribute is present
        if shift == shift_i :
        # If the Shift attribute is equal to the current iteration from the 'shift_list', takes the difference of start and end and appends that value to the list with the given Name, Reason, and Shift
            tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
            d[state, reason, shift].append((tdelta.total_seconds()) / 60)

    for node in tree.xpath("//Data/Status[not(@Reason)]"):
    # Iterates through Status nodes with no Reason attribute
        state = node.attrib["Name"]
        end = node.attrib["End"]
        start = node.attrib[u'Start']
        # Sets each of the attribute values to the name of the attribute all lowercase
        try:
            shift = node.attrib[u'Shift']
        except:
            continue
        # Tries to set shift attribute value to 'shift' variable, sometimes fails if no Shift
        #      attribute is present
        if shift == shift_i:
            # If the Shift attribute is equal to the current iteration from the 'shift_list',
            #   takes the difference of start and end and appends that value to the list with
            #   the given Name, "No Reason" string, and Shift
            tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
            d[state, 'No Reason', shift].append((tdelta.total_seconds()) / 60)

    for item in d :
     # Iterates through all items of d
        d[item] = sum(d[item])
        # Sums all values related to 'item' and replaces value in dictionary
    a.update(d)
    # Current keys and values in temporary dictionary 'd' to permanent
    #   dictionary 'a' for further analysis
    d.clear()
    # Clears dictionary d of current iterations keys and values to start fresh for next
    #   iteration. If this is not done, d[item] = sum(d[item]) returns
    #   "TypeError: 'float' object is not iterable"

这会创建一个字典，其值如下所示：

{('Name1','Reason','Shift'):Value,('Name2','Reason','Shift'):Value....}

print（a）返回此

defaultdict(<class 'list'>, {('Test Run', 'No Reason', 'Night'): 5.03825, ('Slow Running', 'No Reason', 'Day'): 10.72996666666667, ('Idle', 'Shift Start Up', 'Day'): 5.425433333333333, ('Idle', 'Unscheduled', 'Afternoon'): 470.0, ('Idle', 'Early Departure', 'Day'): 0.32965, ('Idle', 'Break Creep', 'Day'): 24.754250000000003, ('Idle', 'Break', 'Day'): 40.0, ('Micro Stoppage', 'No Reason', 'Day'): 39.71673333333333, ('Idle', 'Unscheduled', 'Night'): 474.96175, ('Running', 'No Reason', 'Day'): 329.4991500000004, ('Idle', 'No Reason', 'Day'): 19.544816666666666})

我想创建一个包含＆＃39;姓名＆＃39; +＆＃39;原因＆＃39;列的.csv。总计，行由＆＃39; Shift＆＃39;描述。像这样：

         Name1-Reason    Name2-Reason    Name3-Reason    Name4-Reason
Shift1      value          value            value           value
Shift2      value          value            value           value
Shift3      value          value            value           value

我不确定该怎么做。我尝试使用嵌套的Dicts来更好地描述我的数据，但是在使用

时我得到了一个TypeError

d[state][reason][shift].append((tdelta.total_seconds()) / 60)

如果有更好的方法可以让我知道，我是一个非常新的人，并希望听到所有建议。

Answer 1

我会使用csv包的DictWriter方法来编写csv文件。为此，您需要有一个词典列表。每个列表项都是shift，并由带有键name＆amp;的字典表示。 reason。它应如下所示：

[{'Name1':value1, 'Name2':value2}, {'Name1':value3, 'Name2':value4}]

Answer 2

我认为以下可能会做你想要的或至少接近。您说CSV文件应该被格式化的方式忽略了一个重要的考虑因素，即每个行必须有一个Name-Reason列，用于两者的每个可能的组合，即使在任何转换行中都没有任何特定的混合物 - 因为，这就是CSV文件格式的工作方式。

from collections import defaultdict
import csv

# Dictionary keys are (Name, Reason, Shift)
d = {('Test Run', 'No Reason', 'Night'): 5.03825,
     ('Slow Running', 'No Reason', 'Day'): 10.72996666666667,
     ('Idle', 'Shift Start Up', 'Day'): 5.425433333333333,
     ('Idle', 'Unscheduled', 'Afternoon'): 470.0,
     ('Idle', 'Early Departure', 'Day'): 0.32965,
     ('Idle', 'Break Creep', 'Day'): 24.754250000000003,
     ('Idle', 'Break', 'Day'): 40.0,
     ('Micro Stoppage', 'No Reason', 'Day'): 39.71673333333333,
     ('Idle', 'Unscheduled', 'Night'): 474.96175,
     ('Running', 'No Reason', 'Day'): 329.4991500000004,
     ('Idle', 'No Reason', 'Day'): 19.544816666666666}

# Transfer data to a defaultdict of dicts.
dd = defaultdict(dict)
for (name, reason, shift), value in d.items():
    name_reason = name + '-' + reason  # Merge together to form lower level keys
    dd[shift][name_reason] = value

# Create a csv file from the data in the defaultdict.
ABSENT = '---'  # Placeholder for empty fields
name_reasons = sorted(name_reason for shift in dd.keys()
                                    for name_reason in dd[shift])
with open('dict.csv', 'wb') as csv_file:
    writer = csv.writer(csv_file, delimiter=',')
    writer.writerow(['Shift'] + name_reasons)  # column headers row
    for shift in sorted(dd):
        row = [shift] + [dd[shift].get(name_reason, ABSENT)
                                        for name_reason in name_reasons]
        writer.writerow(row)

以上代码创建的dict.csv文件的内容：

Shift,Idle-Break,Idle-Break Creep,Idle-Early Departure,Idle-No Reason,Idle-Shift Start Up,Idle-Unscheduled,Idle-Unscheduled,Micro Stoppage-No Reason,Running-No Reason,Slow Running-No Reason,Test Run-No Reason
Afternoon,---,---,---,---,---,470.0,470.0,---,---,---,---
Day,40.0,24.754250000000003,0.32965,19.544816666666666,5.425433333333333,---,---,39.71673333333333,329.4991500000004,10.72996666666667,---
Night,---,---,---,---,---,474.96175,474.96175,---,---,---,5.03825

根据字典中的值创建.csv

2 个答案: