INPUT文件:
$ cat dummy.csv
OS,A,B,C,D,E
Ubuntu,0,1,0,1,1
Windows,0,0,1,1,1
Mac,1,0,1,0,0
Ubuntu,1,1,1,1,0
Windows,0,0,1,1,0
Mac,1,0,1,1,1
Ubuntu,0,1,0,1,1
Ubuntu,0,0,1,1,1
Ubuntu,1,0,1,0,0
Ubuntu,1,1,1,1,0
Mac,0,0,1,1,0
Mac,1,0,1,1,1
Windows,1,1,1,1,0
Ubuntu,0,0,1,1,0
Windows,1,0,1,1,1
Mac,0,1,0,1,1
Windows,0,0,1,1,1
Mac,1,0,1,0,0
Windows,1,1,1,1,0
Mac,0,0,1,1,0
预期输出:
OS,A,B,C,D,E
Mac,4,1,6,5,3
Ubuntu,3,4,5,6,3
Windows,3,2,6,6,3
我使用Excel的数据透视表生成了以上输出。
mycode的:
import csv
import pprint
from collections import defaultdict
d = defaultdict(dict)
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
d[row['OS']]['A'] += row['A']
d[row['OS']]['B'] += row['B']
d[row['OS']]['C'] += row['C']
d[row['OS']]['D'] += row['D']
d[row['OS']]['E'] += row['E']
pprint.pprint(d)
错误:
$ python3 dummy.py
Traceback (most recent call last):
File "dummy.py", line 10, in <module>
d[row['OS']]['A'] += row['A']
KeyError: 'A'
我的想法是将CSV值累积到字典中,然后打印出来。但是,当我尝试添加值时,我遇到了上述错误。
这似乎可以通过内置的csv
模块实现。我认为这是一个更容易的:(任何指针都会有很大的帮助。
答案 0 :(得分:1)
有两个问题。嵌套字典最初没有设置任何键,因此d[row[OS]]['A']
会导致错误。另一个问题是您需要在添加列值之前将其转换为int
。
您可以在defaultdict
中使用Counter
作为值,因为缺少密钥默认为0
:
import csv
from collections import Counter, defaultdict
d = defaultdict(Counter)
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
nested = d[row.pop('OS')]
for k, v in row.items():
nested[k] += int(v)
print(*d.items(), sep='\n')
输出:
('Ubuntu', Counter({'D': 6, 'C': 5, 'B': 4, 'E': 3, 'A': 3}))
('Windows', Counter({'C': 6, 'D': 6, 'E': 3, 'A': 3, 'B': 2}))
('Mac', Counter({'C': 6, 'D': 5, 'A': 4, 'E': 3, 'B': 1}))
答案 1 :(得分:1)
这并不能完全回答你的问题,因为使用csv
确实可以解决问题,但值得一提的是pandas
对于这类事情是完美的:
In [1]: import pandas as pd
In [2]: df = pd.read_csv('dummy.csv')
In [3]: df.groupby('OS').sum()
Out[3]:
A B C D E
OS
Mac 4 1 6 5 3
Ubuntu 3 4 5 6 3
Windows 3 2 6 6 3
答案 2 :(得分:1)
这样的东西?您可以将数据帧写入csv文件以获得所需的格式。
import pandas as pd
# df0=pd.read_clipboard(sep=',')
# df0
df=df0.copy()
df=df.groupby(by='OS').sum()
print df
输出:
A B C D E
OS
Mac 4 1 6 5 3
Ubuntu 3 4 5 6 3
Windows 3 2 6 6 3
df.to_csv('file01')
<强> file01 强>
OS,A,B,C,D,E
Mac,4,1,6,5,3
Ubuntu,3,4,5,6,3
Windows,3,2,6,6,3
答案 3 :(得分:1)
您遇到了这个例外,因为第一次row['OS']
中d
不存在'A'
,因此d[row['OS']]
中不存在import csv
from collections import defaultdict
d = defaultdict(dict)
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
d[row['OS']]['A'] = d[row['OS']]['A'] + int(row['A']) if (row['OS'] in d and 'A' in d[row['OS']]) else int(row['A'])
d[row['OS']]['B'] = d[row['OS']]['B'] + int(row['B']) if (row['OS'] in d and 'B' in d[row['OS']]) else int(row['B'])
d[row['OS']]['C'] = d[row['OS']]['C'] + int(row['C']) if (row['OS'] in d and 'C' in d[row['OS']]) else int(row['C'])
d[row['OS']]['D'] = d[row['OS']]['D'] + int(row['D']) if (row['OS'] in d and 'D' in d[row['OS']]) else int(row['D'])
d[row['OS']]['E'] = d[row['OS']]['E'] + int(row['E']) if (row['OS'] in d and 'E' in d[row['OS']]) else int(row['E'])
。请尝试以下方法来解决此问题:
>>> import pprint
>>>
>>> pprint.pprint(dict(d))
{'Mac': {'A': 4, 'B': 1, 'C': 6, 'D': 5, 'E': 3},
'Ubuntu': {'A': 3, 'B': 4, 'C': 5, 'D': 6, 'E': 3},
'Windows': {'A': 3, 'B': 2, 'C': 6, 'D': 6, 'E': 3}}
<强>输出:强>
FLOPS C Program (double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -2.5613e-010 0.0034 4177.1562
2 -1.4166e-013 0.0058 1209.1768
3 3.1904e-010 0.0011 15487.5445
4 9.0594e-014 0.0011 14065.9341
5 -6.2284e-014 0.0034 8652.6807
6 3.3640e-014 0.0021 13994.3450
7 9.4360e-012 0.0101 1193.4732
8 3.7637e-014 0.0022 13677.6492
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1730.8542
MFLOPS(2) = 2971.1755
MFLOPS(3) = 6296.4960
MFLOPS(4) = 14153.0984
答案 4 :(得分:0)
d
是一个字典,因此d[row['OS']]
是一个有效的表达式,但d[row['OS']]['A']
期望该字典项是某种集合。由于您没有提供默认值,因此它将是None
,而不是。{/ p>
答案 5 :(得分:0)
这扩展了niemmi's solution以将输出格式设置为与OP's example相同:
import csv
from collections import Counter, defaultdict
d = defaultdict(Counter)
with open('dummy.csv') as csv_file:
reader = csv.DictReader(csv_file)
field_names = reader.fieldnames
for row in reader:
counter = d[row.pop('OS')]
for key, value in row.iteritems():
counter[key] += int(value)
print ','.join(field_names)
for os, counter in sorted(d.iteritems()):
print "%s,%s" % (os, ','.join([str(v) for k, v in sorted(counter.iteritems())]))
<强>输出强>
OS,A,B,C,D,E
Mac,4,1,6,5,3
Ubuntu,3,4,5,6,3
Windows,3,2,6,6,3
更新:修正了输出。
答案 6 :(得分:0)
我假设您的输入文件名为input_file.csv
。
您还可以使用groupby
模块中的itertools
和two dicts
处理您的数据并获得所需的输出,如下例所示:
from itertools import groupby
data = list(k.strip("\n").split(",") for k in open("input_file.csv", 'r'))
a, b = {}, {}
for k, v in groupby(data[1:], lambda x : x[0]):
try:
a[k] += [i[1:] for i in list(v)]
except KeyError:
a[k] = [i[1:] for i in list(v)]
for key in a.keys():
for j in range(5):
c = 0
for i in a[key]:
c += int(i[j])
try:
b[key] += ',' + str(c)
except KeyError:
b[key] = str(c)
输出:
print(','.join(data[0]))
for k in b.keys():
print("{0},{1}".format(k, b[k]))
>>> OS,A,B,C,D,E
>>> Ubuntu,3,4,5,6,3
>>> Windows,3,2,6,6,3
>>> Mac,4,1,6,5,3