Question

熊猫系列：

    2004-01-01    0
    2004-01-02    0
    2004-01-03    0
    2004-01-04    0
    2004-01-05    1
    2004-01-06    0
    2004-01-07    0
    2004-01-08    3
    2004-01-09    0
    2004-01-10    2
    2004-01-11    0

我想有效地添加一列，该列计算实际行与下一个大于0的行之间的行数。

在这种情况下为：

    2004-01-01    0     3
    2004-01-02    0     2
    2004-01-03    0     1
    2004-01-04    0     0
    2004-01-05    1     2
    2004-01-06    0     1
    2004-01-07    0     0
    2004-01-08    3     1
    2004-01-09    0     0
    2004-01-10    2     ...
    2004-01-11    0     ...

新列的第一个数字是3，因为该行和下一行之间有3行，第一列的内容与0不同，依此类推。

一种有效的方法吗？

Answer 1

使用：

df['B'] = df.groupby(df.A.gt(0).cumsum()).cumcount(ascending=False)
print (df)
            A  B
2004-01-01  0  3
2004-01-02  0  2
2004-01-03  0  1
2004-01-04  0  0
2004-01-05  1  2
2004-01-06  0  1
2004-01-07  0  0
2004-01-08  3  1
2004-01-09  0  0
2004-01-10  2  1
2004-01-11  0  0

说明：

首先比较gt >的布尔掩码：

print (df.A.gt(0))
2004-01-01    False
2004-01-02    False
2004-01-03    False
2004-01-04    False
2004-01-05     True
2004-01-06    False
2004-01-07    False
2004-01-08     True
2004-01-09    False
2004-01-10     True
2004-01-11    False
Name: A, dtype: bool

然后使用Series.cumsum来累积sum：

print (df.A.gt(0).cumsum())
2004-01-01    0
2004-01-02    0
2004-01-03    0
2004-01-04    0
2004-01-05    1
2004-01-06    1
2004-01-07    1
2004-01-08    2
2004-01-09    2
2004-01-10    3
2004-01-11    3
Name: A, dtype: int32

最后使用GroupBy.cumcount和ascending=False来降低计数器的降序：

print (df.groupby(df.A.gt(0).cumsum()).cumcount(ascending=False))
2004-01-01    3
2004-01-02    2
2004-01-03    1
2004-01-04    0
2004-01-05    2
2004-01-06    1
2004-01-07    0
2004-01-08    1
2004-01-09    0
2004-01-10    1
2004-01-11    0
dtype: int64

熊猫系列-计算列值之间的行

1 个答案: