Question

我有这样的CSV文件：

Datetime，Usage1，Project1
  Datetime，Usage2，Project1
  Datetime，Usage3，Project2
  Datetime，Usage4，Project3

目标是总结每个项目的使用情况并得到如下报告：

PROJECT1：   Usage1   Usage2

Project2中：   Usage3

项目3：   Usage4

我从以下Python代码开始，但它无法正常工作：

#/usr/bin/python

# obtain all Project values into new list project_tags:

project_tags = []
ifile = open("file.csv","r")
reader = csv.reader(ifile)
headerline = ifile.next()
for row in reader:
    project_tags.append(str(row[2]))
ifile.close()

# obtain sorted and unique list and put it into a new list project_tags2
project_tags2 = []
for p in list(set(project_tags)):
    project_tags2.append(p)


# open CSV file again and compare it with new unique list
ifile2 = open("file.csv","r")
reader2 = csv.reader(ifile2)
headerline = ifile2.next()

# Loop through both new list and a CSV file, and if they matches sum it:

sum_per_project = sum_per_project + int(row[29])
for project in project_tags2:
    for row in reader2:
        if row[2] == project:
            sum_per_project = sum_per_project + int(row[1])

感谢任何输入！

提前致谢。

Answer 1

尝试以下代码段：

summary = {}

with open("file.csv", "r") as fp:
    for line in fp:
        row = line.rstrip().split(',')

        key = row[2]
        if key in summary:
            summary[key] += (row[1].strip(),)
        else:
            summary[key] = (row[1].strip(),)

for k in summary:
    print('{0}: {1}'.format(k, ' '.join(summary[k])))

根据csv文件中的示例数据，它将打印：

 Project1: Usage1 Usage2
 Project2: Usage3
 Project3: Usage4

Answer 2

这是一种defaultdict的方法。

修改：感谢@ Saleem提醒我with子句，我们只需要输出内容

from collections import defaultdict import csv summary = defaultdict(list) with open(path, "r") as f: rows = csv.reader(f) header = rows.next() for (dte, usage, proj) in rows: summary[proj.strip()]+=[usage.strip()] # I just realized that all you needed to do was output them: for proj, usages in sorted(summary.iteritems()): print( "%s: %s" % (proj, ' '.join(sorted(usages))) )

将打印

Project1: Usage1 Usage2 Project2: Usage3 Project3: Usage4

Python排序和汇总CSV

2 个答案: