我有一个像这样的数据框:
''' df:
index, sales_fraction, Selected, T_value, A_value, D_value
1 0.33 t 0.3343 0.33434 0.33434
2 0.45 a 0.3434 0.23232 0.33434
3 0.56 d 0.3434 0.33434 0.6767
4 0.545 t 0.3434 0.33434 0.3346
5 0.343 d 0.2323 0.96342 0.2323
'''
我有这样的功能:
def aggregation(df):
df['sales_fraction'] = df['volume']/df['volume'].sum()
res = 0
for ix, row in df.iterrows():
if row['Selected'] == 't':
res += row['sales_fraction'] * row['T_value']
elif row['Selected'] == 'a':
res += row['sales_fraction'] * row['A_value']
elif row['Selected'] == 'd':
res += row['sales_fraction'] * row['D_value']
return res
它运行非常慢,因为我需要在另一个函数中使用聚合函数数百万次。有什么建议可以优化我的代码吗?非常感谢您的帮助。谢谢!
答案 0 :(得分:1)
cond1 = df['Selected'] == 't'
cond2= df['Selected'] =='a'
cond3 = df['Selected']=='d'
val1 = df['sales_fraction'] * df['T_value']
val2 = df['sales_fraction'] * df['a_value']
val3 = df['sales_fraction'] * df['D_value']
conditions = [cond1, cond2, cond3]
values = [val1, val2, val3]
res = np.sum(np.select(conditions, values))
np.select
可以接受多个条件,并为这些条件返回相应的值。因此,您可以拥有一个conditions
列表和一个values
列表并将其传递给np.select
。然后np.sum
将返回所有值的总和
答案 1 :(得分:1)
我正在使用lookup
s=df.loc[:,'T_value':]
s.columns=s.columns.str.split('_').str[0]
np.sum(df.sales_fraction*s.lookup(s.index,df.Selected.str.upper()))
Out[1421]: 0.8606469
答案 2 :(得分:1)
尝试pd.get_dummies()
:
weights = pd.get_dummies(df.Selected)[['t','a', 'd']]
selected = (df[['T_value', 'A_value', 'D_value']].values * weights.values).sum(1)
(selected * df['sales_fraction']).sum()
# 0.8606469
答案 3 :(得分:1)
此功能使用查找和求和
def aggregation(df):
return sum(df.lookup(df.index, df['Selected'].str.upper() +'_value')*df['sales_fraction'])
答案 4 :(得分:1)
如果我正确理解了您的计算方式,那么我建议您尝试使用此行代码,并将其与函数结果进行比较(一切都是内联的):
(df.loc[df["Selected"] == 't',"T_value"] * df.loc[df["Selected"] ==
't',"sales_fraction"]).sum() + (df.loc[df["Selected"] == 'a',"A_value"] *
df.loc[df["Selected"] == 'a',"sales_fraction"]).sum()+(df.loc[df["Selected"] ==
'd',"D_value"] * df.loc[df["Selected"] == 'd',"sales_fraction"]).sum()