我有这样的CSV文件:
Datetime,Usage1,Project1
Datetime,Usage2,Project1
Datetime,Usage3,Project2
Datetime,Usage4,Project3
目标是总结每个项目的使用情况并得到如下报告:
PROJECT1: Usage1 Usage2
Project2中: Usage3
项目3: Usage4
我从以下Python代码开始,但它无法正常工作:
#/usr/bin/python
# obtain all Project values into new list project_tags:
project_tags = []
ifile = open("file.csv","r")
reader = csv.reader(ifile)
headerline = ifile.next()
for row in reader:
project_tags.append(str(row[2]))
ifile.close()
# obtain sorted and unique list and put it into a new list project_tags2
project_tags2 = []
for p in list(set(project_tags)):
project_tags2.append(p)
# open CSV file again and compare it with new unique list
ifile2 = open("file.csv","r")
reader2 = csv.reader(ifile2)
headerline = ifile2.next()
# Loop through both new list and a CSV file, and if they matches sum it:
sum_per_project = sum_per_project + int(row[29])
for project in project_tags2:
for row in reader2:
if row[2] == project:
sum_per_project = sum_per_project + int(row[1])
感谢任何输入!
提前致谢。
答案 0 :(得分:1)
尝试以下代码段:
summary = {}
with open("file.csv", "r") as fp:
for line in fp:
row = line.rstrip().split(',')
key = row[2]
if key in summary:
summary[key] += (row[1].strip(),)
else:
summary[key] = (row[1].strip(),)
for k in summary:
print('{0}: {1}'.format(k, ' '.join(summary[k])))
根据csv文件中的示例数据,它将打印:
Project1: Usage1 Usage2
Project2: Usage3
Project3: Usage4
答案 1 :(得分:0)
这是一种defaultdict
的方法。
修改强>:
感谢@ Saleem提醒我with
子句,我们只需要输出内容
from collections import defaultdict
import csv
summary = defaultdict(list)
with open(path, "r") as f:
rows = csv.reader(f)
header = rows.next()
for (dte, usage, proj) in rows:
summary[proj.strip()]+=[usage.strip()]
# I just realized that all you needed to do was output them:
for proj, usages in sorted(summary.iteritems()):
print(
"%s: %s" % (proj, ' '.join(sorted(usages)))
)
将打印
Project1: Usage1 Usage2
Project2: Usage3
Project3: Usage4