我计划用Python而不是Excel计算大量数据,但是因为我知道Excel命令而陷入困境,我很难用Python复制它。
基本上,我想要导入CSV文件,识别C列的位置,然后对于A列中的所有唯一值,将C中适用于B中条件1990 < x < 2000
的所有值相加
A,B,C
9,1952,125
2,1994,69
3,1973,72
5,1992,85
1,1994,38
1,1994,95
4,1992,29
8,1984,94
我从:
开始import csv
with open('TestCase.txt', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
row1 = next(reader)
我没有编写多个if
语句,而是想创建新的数组,由0和1组成,然后用C求和所有值。
鉴于另一个条件,结果将如下所示
1980<x<1989 94
1990<x<2000 316
额外奖励将是A中唯一值的总数,表示总和
UniqueValues Condition TotalSum
1 1980<x<1989 94
4 1990<x<2000 316
答案 0 :(得分:1)
如果您对使用第三方库感到满意,可以通过pandas
:
import pandas as pd
# read csv file
df = pd.read_csv('file.csv')
# filter column B, group by A, sum C
res = df.loc[df['B'].between(1990, 2000)]\
.groupby('A')['C'].sum()\
.reset_index()
结果:
A C
0 1 133
1 2 69
2 4 29
3 5 85
答案 1 :(得分:1)
from io import StringIO
import pandas as pd
txt = StringIO("""
A,B,C
9,1952,125
2,1994,69
3,1973,72
5,1992,85
1,1994,38
1,1994,95
4,1992,29
8,1984,94
""")
df = pd.read_csv(txt )
#condition = (df["B"] >1980) & (df["B"] < 1989)
condition = (df["B"] >1990) & (df["B"] < 2000)
df_cond = df[condition]
df_uniq = df_cond.drop_duplicates('A', keep=False)
df_uniq_keep_first = df_cond.drop_duplicates('A', keep="first")
df_uniq_keep_last = df_cond.drop_duplicates('A', keep="last")
sum_dupl = df_cond["C"].sum()
sum_uniq = df_uniq["C"].sum()
sum_uniq_keep_first = df_uniq_keep_first["C"].sum()
sum_uniq_keep_last = df_uniq_keep_last["C"].sum()
print("sum with duplicates : " + str(sum_dupl)) #316
print("sum pure unique : " + str(sum_uniq)) #183
print("sum unique keep first: " + str(sum_uniq_keep_first)) #221
print("sum unique keep last : " + str(sum_uniq_keep_last)) #278
答案 2 :(得分:0)
您可以使用:
l = list()
d = dict()
with open('TestCase.txt', 'r') as file:
for line in file:
l.append(line.rstrip("\n").split(',')) # l=[[9,1952,125],[2,1994,69],...]
for item in l:
if 1990 < int(item[1]) < 2000: # The desired condition for colum B
d[item[0]] = d[item[0]] + int(item[2]) if item[0] in d else int(item[2])
d
字典将A
的唯一值作为其关键字,并将C
的总和作为其值。