基于Python中的唯一键和条件创建新数组

时间:2018-04-08 10:22:15

标签: python excel indexing match

我计划用Python而不是Excel计算大量数据,但是因为我知道Excel命令而陷入困境,我很难用Python复制它。

基本上,我想要导入CSV文件,识别C列的位置,然后对于A列中的所有唯一值,将C中适用于B中条件1990 < x < 2000的所有值相加

A,B,C
9,1952,125
2,1994,69
3,1973,72
5,1992,85
1,1994,38
1,1994,95
4,1992,29
8,1984,94

我从:

开始
import csv
with open('TestCase.txt', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    row1 = next(reader)

我没有编写多个if语句,而是想创建新的数组,由0和1组成,然后用C求和所有值。

鉴于另一个条件,结果将如下所示

1980<x<1989 94
1990<x<2000 316

额外奖励将是A中唯一值的总数,表示总和

UniqueValues    Condition   TotalSum
1   1980<x<1989 94
4   1990<x<2000 316

3 个答案:

答案 0 :(得分:1)

如果您对使用第三方库感到满意,可以通过pandas

进行矢量化
import pandas as pd

# read csv file
df = pd.read_csv('file.csv')

# filter column B, group by A, sum C
res = df.loc[df['B'].between(1990, 2000)]\
        .groupby('A')['C'].sum()\
        .reset_index()

结果:

   A    C
0  1  133
1  2   69
2  4   29
3  5   85

答案 1 :(得分:1)

from io import StringIO
import pandas as pd

txt = StringIO("""
A,B,C
9,1952,125
2,1994,69
3,1973,72
5,1992,85
1,1994,38
1,1994,95
4,1992,29
8,1984,94
""")

df = pd.read_csv(txt )

#condition = (df["B"] >1980) & (df["B"] < 1989)
condition = (df["B"] >1990) & (df["B"] < 2000)
df_cond = df[condition]

df_uniq = df_cond.drop_duplicates('A', keep=False)
df_uniq_keep_first = df_cond.drop_duplicates('A', keep="first")
df_uniq_keep_last = df_cond.drop_duplicates('A', keep="last")

sum_dupl = df_cond["C"].sum()
sum_uniq = df_uniq["C"].sum()
sum_uniq_keep_first = df_uniq_keep_first["C"].sum()
sum_uniq_keep_last = df_uniq_keep_last["C"].sum()

print("sum with duplicates  : " + str(sum_dupl))            #316
print("sum pure unique      : " + str(sum_uniq))            #183
print("sum unique keep first: " + str(sum_uniq_keep_first)) #221 
print("sum unique keep last : " + str(sum_uniq_keep_last))  #278

答案 2 :(得分:0)

您可以使用:

l = list()
d = dict()
with open('TestCase.txt', 'r') as file:
    for line in file:
        l.append(line.rstrip("\n").split(',')) # l=[[9,1952,125],[2,1994,69],...]

    for item in l:
        if 1990 < int(item[1]) < 2000: # The desired condition for colum B 
            d[item[0]] = d[item[0]] + int(item[2]) if item[0] in d else int(item[2])

d字典将A的唯一值作为其关键字,并将C的总和作为其值。