我有一个这样的数据框:
pd.DataFrame({'group': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2}, 'year': {0: 2007, 1: 2008, 2: 2009, 3: 2010, 4: 2006, 5: 2007, 6: 2008}, 'amount': {0: 2.0, 1: -4.0, 2: 5, 3: 7.0, 4: 8.0, 5: -10.0, 6: 12.0}}])
group year amount
0 1 2007 2
1 1 2008 -4
2 1 2009 5
3 1 2010 7
4 2 2006 8
5 2 2007 -10
6 2 2008 12
我要添加最小,最大,金额为负的年数,金额为正的年数(直到每年(含))。我理想的数据框看起来像这样
group year amount min_utd max_utd no_n_utd no_p_utd
0 1 2007 2 2 2 0 1
1 1 2008 -4 -4 2 1 1
2 1 2009 5 -4 5 1 2
3 1 2010 7 -4 7 1 3
4 2 2006 8 8 8 0 1
5 2 2007 -10 -10 8 1 1
6 2 2008 12 -10 12 1 2
我只知道agg
可以用于整个组,而rolling
可以用于滑动窗口,但是我不知道如何从头开始计算线。
答案 0 :(得分:2)
将DataFrameGroupBy.cummax
与DataFrameGroupBy.cummin
一起使用,然后将DataFrameGroupBy.cumsum
与lt
(<
)和ge
(> =)进行比较:>
df[['min_utd','max_utd']] = df.groupby('group')['amount'].agg(['cummin','cummax'])
df['no_n_utd'] = df['amount'].lt(0).astype(int).groupby(df['group']).cumsum()
df['no_p_utd'] = df['amount'].ge(0).astype(int).groupby(df['group']).cumsum()
print (df)
group year amount min_utd max_utd no_n_utd no_p_utd
0 1 2007 2 2 2 0 1
1 1 2008 -4 -4 2 1 1
2 1 2009 5 -4 5 1 2
3 1 2010 7 -4 7 1 3
4 2 2006 8 8 8 0 1
5 2 2007 -10 -10 8 1 1
6 2 2008 12 -10 12 1 2
另一种具有相同原理但具有自定义功能的解决方案:
def f(x):
a = x.cummin()
b = x.cummax()
c = x.lt(0).cumsum()
d = x.ge(0).cumsum()
return pd.DataFrame({'min_utd':a, 'max_utd':b, 'no_n_utd':c, 'no_p_utd':d})
df = df.join(df.groupby('group')['amount'].apply(f))
print (df)
group year amount min_utd max_utd no_n_utd no_p_utd
0 1 2007 2 2 2 0 1
1 1 2008 -4 -4 2 1 1
2 1 2009 5 -4 5 1 2
3 1 2010 7 -4 7 1 3
4 2 2006 8 8 8 0 1
5 2 2007 -10 -10 8 1 1
6 2 2008 12 -10 12 1 2
答案 1 :(得分:1)
您需要:
grp = df.groupby('group')
df.assign(
min_utd = grp['amount'].cummin(),
max_utd = grp['amount'].cummax(),
no_n_utd = grp.apply(lambda g: g['amount'].lt(0).cumsum()).values,
no_p_utd = grp.apply(lambda g: g['amount'].gt(0).cumsum()).values
)
输出:
group year amount min_utd max_utd no_n_utd no_p_utd
0 1 2007 2.0 2.0 2.0 0 1
1 1 2008 -4.0 -4.0 2.0 1 1
2 1 2009 5.0 -4.0 5.0 1 2
3 1 2010 7.0 -4.0 7.0 1 3
4 2 2006 8.0 8.0 8.0 0 1
5 2 2007 -10.0 -10.0 8.0 1 1
6 2 2008 12.0 -10.0 12.0 1 2