我的csv中有数据需要解析。看起来像:
Date,Tag,Amount
13/06/2018,ABC,6750000
13/06/2018,ABC,159800
24/05/2018,ABC,-1848920
16/05/2018,AB,-1829700
16/05/2018,AB,3600000
28/06/2018,A,15938000
16/05/2018,AB,3748998
28/06/2018,A,1035000
28/06/2018,A,1035000
14/06/2018,ABC,2122717
您可以看到每个日期旁边都有一个标签和数字。 我要达到的目的是确定日期并标记关键字,并按日期和标记进行分组并总结金额。
预期结果
Date,Tag,Amount
13/06/2018,ABC,5220680
16/05/2018,AB,5519298
28/06/2018,A,18008000
14/06/2018,ABC,2122717
我现在使用的代码在下面,无法正常工作。
from collections import defaultdict
import csv
d = defaultdict(int)
with open("file.csv") as f:
for line in f:
tokens = [t.strip() for t in line.split(",")]
try:
date = int(tokens[0])
tag = int(tokens[1])
amount = int(tokens[2])
except ValueError:
continue
d[date] += amount
print d
有人可以告诉我如何不用熊猫来避免这种情况吗
答案 0 :(得分:1)
您绝对应该使用pandas
。除了您必须自己编写代码之外,您只需安装pandas模块,然后导入它(import pandas as pd
),即可使用2条简单直观的代码行来解决此问题>
>>> df = pd.read_csv('file.csv')
>>> df.groupby(['Date', 'Tag']).Amount.sum()
Date Tag
13/06/2018 ABC 6909800
14/06/2018 ABC 2122717
16/05/2018 AB 5519298
24/05/2018 ABC -1848920
28/06/2018 A 18008000
如果您真的需要自己编写代码,则可以使用嵌套的defaultdict
,这样就可以具有两层groupby。另外,为什么还要尝试将int
和date
转换为tag
?毫无意义。只需将其删除。
d = defaultdict(lambda: defaultdict(int))
for line in z:
tokens = [t.strip() for t in line.split(",")]
try:
date = tokens[0]
tag = tokens[1]
amount = int(tokens[2])
except ValueError as e:
continue
d[date][tag] += amount
输出为:
13/06/2018 ABC 6909800
24/05/2018 ABC -1848920
16/05/2018 AB 5519298
28/06/2018 A 18008000
14/06/2018 ABC 2122717
要输出上面的结果,只需遍历以下各项:
for k,v in d.items():
for k2, v2 in v.items():
print(k,k2,v2)
要使您的代码更好,请仅阅读第一行,然后从第二行进行迭代直到最后。这样,您的try / except可以删除,您将获得更简单,更简洁的代码。但是你可以从这里接机,对吗? ;)
要写入csv,只需
s = '\n'.join(['{0} {1} {2}'.format(k, k2, v2) for k,v in d.items() for k2,v2 in v.items()])
with open('output.txt', 'w') as f:
f.write(s)
答案 1 :(得分:0)
这是使用简单迭代的一种方法。
例如:
from collections import defaultdict
import csv
result = defaultdict(int)
with open(filename) as infile:
reader = csv.reader(infile)
header = next(reader)
for line in reader:
result[tuple(line[:2])] += int(line[2])
print(header)
for k, v in result.items():
print(k[0], k[1], v)
输出:
14/06/2018 ABC 2122717
13/06/2018 ABC 6909800
28/06/2018 A 18008000
16/05/2018 AB 5519298
24/05/2018 ABC -1848920
至CSV
with open(filename, "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(header)
for k, v in result.items():
writer.writerow([k[0], k[1], v])
答案 2 :(得分:0)
您可以使用itertools.groupby
:
from itertools import groupby
import csv
header, *data = csv.reader(open('filename.csv'))
new_data = [[a, list(b)] for a, b in groupby(sorted(data, key=lambda x:x[:2]), key=lambda x:x[:2])]
results = [[*a, sum(int(c) for *_, c in b)] for a, b in new_data]
with open('calc_results.csv', 'w') as f:
write = csv.writer(f)
write.writerows([header, *results])
输出:
Date,Tag,Amount
13/06/2018,ABC,6909800
14/06/2018,ABC,2122717
16/05/2018,AB,5519298
24/05/2018,ABC,-1848920
28/06/2018,A,18008000