有更好(更快)的方法吗?
我想在某一天找到与该人当天在同一地点的总销售额:
day name sold place
0 mon Ben 2 1
1 mon Amy 6 0
2 mon Sue 7 1
3 mon John 9 0
4 tues Ben 9 1
5 tues Amy 4 0
6 tues Sue 10 1
7 tues John 5 0
8 wed Ben 8 0
9 wed Amy 3 0
10 wed Sue 10 1
11 wed John 3 0
结果如下:
day name sold place sold_at_same_place
0 mon Ben 2 1 9
1 mon Amy 6 0 15
2 mon Sue 7 1 9
3 mon John 9 0 15
4 tues Ben 9 1 19
5 tues Amy 4 0 9
6 tues Sue 10 1 19
7 tues John 5 0 9
8 wed Ben 8 0 14
9 wed Amy 3 0 14
10 wed Sue 10 1 10
11 wed John 3 0 14
如果不清楚,sold
1周一的总place
为2 + 7 = 9。因为Ben有一个,他的sold_in_same_place
是9.Amy的星期一sold_at_same_place
是15,因为她在place
0。
这就是我的想法:
获取每个地方价值的每日总数:
def sold_by_day_filter(df, col_name, field_value):
"""
sums sold by day
filtering the `col_name` on `field_value`
"""
subset = pd.DataFrame(df[df[col_name] == field_value])
aggregated_subset = pd.DataFrame(
{str(field_value): subset.groupby(['day'])['sold'].sum()}
).reset_index()
return aggregated_subset
将每个人加入原始数据集:
for val in df['place'].unique():
df = pd.merge(df, sold_by_day_filter(df,'place', val), on='day')
现在数据集如下所示:
day name sold place 1 0
0 mon Ben 2 1 9 15
1 mon Amy 6 0 9 15
2 mon Sue 7 1 9 15
3 mon John 9 0 9 15
4 tues Ben 9 1 19 9
5 tues Amy 4 0 19 9
6 tues Sue 10 1 19 9
7 tues John 5 0 19 9
8 wed Ben 8 0 10 14
9 wed Amy 3 0 10 14
10 wed Sue 10 1 10 14
11 wed John 3 0 10 14
根据sold_at_same_place
中的值,将值应用于place
列:
df['sold_at_same_place'] = \
df.apply( lambda row: row[str(row['place'])], axis = 1)
删除临时列值(' 1'和' 0'):
fields_to_drop = [str(field) for field in df['place'].unique()]
df.drop(fields_to_drop, axis=1, inplace=True)
所以这很有效,但我觉得可能有一些简单的方法可以用Pandas做到这一点。任何建议都表示赞赏!
答案 0 :(得分:3)
我认为这是使用transform
:
>>> df["sold_at_same_place"] = df.groupby(["day", "place"])["sold"].transform(sum)
>>> df
day name sold place sold_at_same_place
0 mon Ben 2 1 9
1 mon Amy 6 0 15
2 mon Sue 7 1 9
3 mon John 9 0 15
4 tues Ben 9 1 19
5 tues Amy 4 0 9
6 tues Sue 10 1 19
7 tues John 5 0 9
8 wed Ben 8 0 14
9 wed Amy 3 0 14
10 wed Sue 10 1 10
11 wed John 3 0 14
transform
获取groupby结果并将结果广播回原始索引。