熊猫系列:
2004-01-01 0
2004-01-02 0
2004-01-03 0
2004-01-04 0
2004-01-05 1
2004-01-06 0
2004-01-07 0
2004-01-08 3
2004-01-09 0
2004-01-10 2
2004-01-11 0
我想有效地添加一列,该列计算实际行与下一个大于0的行之间的行数。
在这种情况下为:
2004-01-01 0 3
2004-01-02 0 2
2004-01-03 0 1
2004-01-04 0 0
2004-01-05 1 2
2004-01-06 0 1
2004-01-07 0 0
2004-01-08 3 1
2004-01-09 0 0
2004-01-10 2 ...
2004-01-11 0 ...
新列的第一个数字是3,因为该行和下一行之间有3行,第一列的内容与0不同,依此类推。
一种有效的方法吗?
答案 0 :(得分:0)
使用:
df['B'] = df.groupby(df.A.gt(0).cumsum()).cumcount(ascending=False)
print (df)
A B
2004-01-01 0 3
2004-01-02 0 2
2004-01-03 0 1
2004-01-04 0 0
2004-01-05 1 2
2004-01-06 0 1
2004-01-07 0 0
2004-01-08 3 1
2004-01-09 0 0
2004-01-10 2 1
2004-01-11 0 0
说明:
首先比较gt
>
的布尔掩码:
print (df.A.gt(0))
2004-01-01 False
2004-01-02 False
2004-01-03 False
2004-01-04 False
2004-01-05 True
2004-01-06 False
2004-01-07 False
2004-01-08 True
2004-01-09 False
2004-01-10 True
2004-01-11 False
Name: A, dtype: bool
然后使用Series.cumsum
来累积sum
:
print (df.A.gt(0).cumsum())
2004-01-01 0
2004-01-02 0
2004-01-03 0
2004-01-04 0
2004-01-05 1
2004-01-06 1
2004-01-07 1
2004-01-08 2
2004-01-09 2
2004-01-10 3
2004-01-11 3
Name: A, dtype: int32
最后使用GroupBy.cumcount
和ascending=False
来降低计数器的降序:
print (df.groupby(df.A.gt(0).cumsum()).cumcount(ascending=False))
2004-01-01 3
2004-01-02 2
2004-01-03 1
2004-01-04 0
2004-01-05 2
2004-01-06 1
2004-01-07 0
2004-01-08 1
2004-01-09 0
2004-01-10 1
2004-01-11 0
dtype: int64