基于熊猫条件的列值总和

时间:2020-10-31 20:30:54

标签: python pandas dataframe

daychange   SS
0.017065    0
-0.009259   100
0.031542    0
-0.004530   0
0.000709    0
0.004970    100
-0.021900   0
0.003611    0

我有两列,如果SS = 100,我想计算下5个“ daychange”的总和。

我现在正在使用以下内容,但是它并不能完全按照我想要的方式工作:

df['total'] = df.loc[df['SS'] == 100,['daychange']].sum(axis=1) 

1 个答案:

答案 0 :(得分:3)

pandas 1.1起,您可以创建forward rolling window并选择要包含在数据框中的行。我的笔记本内核因不同的论点而终止:请谨慎使用。

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=5)
df['total'] = df.daychange.rolling(indexer, min_periods=1).sum()[df.SS == 100]
df

出局:

   daychange   SS     total
0   0.017065    0       NaN
1  -0.009259  100  0.023432
2   0.031542    0       NaN
3  -0.004530    0       NaN
4   0.000709    0       NaN
5   0.004970  100 -0.013319
6  -0.021900    0       NaN
7   0.003611    0       NaN

从总和中排除SS == 100的起始行

这将是带有SS == 100的行之后的下一行。计算所有行后,您可以使用

df['total'] = df.daychange.rolling(indexer, min_periods=1).sum().shift(-1)[df.SS == 100]
df

出局:

   daychange   SS     total
0   0.017065    0       NaN
1  -0.009259  100  0.010791
2   0.031542    0       NaN
3  -0.004530    0       NaN
4   0.000709    0       NaN
5   0.004970  100 -0.018289
6  -0.021900    0       NaN
7   0.003611    0       NaN

使用选定行的索引的慢速hacky解决方案

感觉就像是骇客,但有效并且避免了计算不必要的滚动值

df['next5sum'] = df[df.SS == 100].index.to_series().apply(lambda x: df.daychange.iloc[x: x + 5].sum())
df

出局:

   daychange   SS  next5sum
0   0.017065    0       NaN
1  -0.009259  100  0.023432
2   0.031542    0       NaN
3  -0.004530    0       NaN
4   0.000709    0       NaN
5   0.004970  100 -0.013319
6  -0.021900    0       NaN
7   0.003611    0       NaN

对于不包括SS == 100的行,接下来的五行之和,您可以调整切片或移动序列

df['next5sum'] = df[df.SS == 100].index.to_series().apply(lambda x: df.daychange.iloc[x + 1: x + 6].sum())
# df['next5sum'] = df[df.SS == 100].index.to_series().apply(lambda x: df.daychange.shift(-1).iloc[x: x + 5].sum())

df

出局:

   daychange   SS  next5sum
0   0.017065    0       NaN
1  -0.009259  100  0.010791
2   0.031542    0       NaN
3  -0.004530    0       NaN
4   0.000709    0       NaN
5   0.004970  100 -0.018289
6  -0.021900    0       NaN
7   0.003611    0       NaN
7   0.003611    0       NaN