熊猫-直到每一行为止的一列的最小值和最大值

时间:2018-07-26 14:33:35

标签: python pandas

我有一个这样的数据框:

pd.DataFrame({'group': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 5: 2, 6: 2}, 'year': {0: 2007, 1: 2008, 2: 2009, 3: 2010, 4: 2006, 5: 2007, 6: 2008}, 'amount': {0: 2.0, 1: -4.0, 2: 5, 3: 7.0, 4: 8.0, 5: -10.0, 6: 12.0}}])

   group    year    amount
0   1       2007    2
1   1       2008    -4
2   1       2009    5
3   1       2010    7
4   2       2006    8
5   2       2007    -10
6   2       2008    12

我要添加最小,最大,金额为负的年数,金额为正的年数(直到每年(含))。我理想的数据框看起来像这样

   group    year    amount    min_utd    max_utd   no_n_utd    no_p_utd
0   1       2007    2           2           2         0           1
1   1       2008    -4         -4           2         1           1
2   1       2009    5          -4           5         1           2
3   1       2010    7          -4           7         1           3
4   2       2006    8           8           8         0           1
5   2       2007    -10        -10          8         1           1 
6   2       2008    12         -10          12        1           2

我只知道agg可以用于整个组,而rolling可以用于滑动窗口,但是我不知道如何从头开始计算线。

2 个答案:

答案 0 :(得分:2)

DataFrameGroupBy.cummaxDataFrameGroupBy.cummin一起使用,然后将DataFrameGroupBy.cumsumlt<)和ge(> =)进行比较:

df[['min_utd','max_utd']] = df.groupby('group')['amount'].agg(['cummin','cummax'])
df['no_n_utd'] = df['amount'].lt(0).astype(int).groupby(df['group']).cumsum()
df['no_p_utd'] = df['amount'].ge(0).astype(int).groupby(df['group']).cumsum()

print (df)
   group  year  amount  min_utd  max_utd  no_n_utd  no_p_utd
0      1  2007       2        2        2         0         1
1      1  2008      -4       -4        2         1         1
2      1  2009       5       -4        5         1         2
3      1  2010       7       -4        7         1         3
4      2  2006       8        8        8         0         1
5      2  2007     -10      -10        8         1         1
6      2  2008      12      -10       12         1         2

另一种具有相同原理但具有自定义功能的解决方案:

def f(x):
   a = x.cummin()
   b = x.cummax()
   c = x.lt(0).cumsum()
   d = x.ge(0).cumsum()
   return pd.DataFrame({'min_utd':a, 'max_utd':b, 'no_n_utd':c, 'no_p_utd':d})

df = df.join(df.groupby('group')['amount'].apply(f))
print (df)
   group  year  amount  min_utd  max_utd  no_n_utd  no_p_utd
0      1  2007       2        2        2         0         1
1      1  2008      -4       -4        2         1         1
2      1  2009       5       -4        5         1         2
3      1  2010       7       -4        7         1         3
4      2  2006       8        8        8         0         1
5      2  2007     -10      -10        8         1         1
6      2  2008      12      -10       12         1         2

答案 1 :(得分:1)

您需要:

grp = df.groupby('group')
df.assign(
    min_utd = grp['amount'].cummin(),
    max_utd = grp['amount'].cummax(),
    no_n_utd = grp.apply(lambda g: g['amount'].lt(0).cumsum()).values,
    no_p_utd = grp.apply(lambda g: g['amount'].gt(0).cumsum()).values
)

输出:

   group  year  amount  min_utd  max_utd  no_n_utd  no_p_utd
0      1  2007     2.0      2.0      2.0         0         1
1      1  2008    -4.0     -4.0      2.0         1         1
2      1  2009     5.0     -4.0      5.0         1         2
3      1  2010     7.0     -4.0      7.0         1         3
4      2  2006     8.0      8.0      8.0         0         1
5      2  2007   -10.0    -10.0      8.0         1         1
6      2  2008    12.0    -10.0     12.0         1         2
相关问题