我得到了这个代码:
df=df\
.assign(SUM_pre = lambda x: x['value1'].rolling(???).sum())
我想在特定滚动上对列 value1 的值求和。事实是滚动随时间变化,滚动的值保存在另一列中,如下所示:
ID span_pre value1 value2
0 A 0 0.1 0
1 A 1 0.0 0
2 A 2 0.1 0
3 A 3 0.1 0
4 A 3 0.1 0
5 A 3 0.1 0
6 S 0 0.2 0
7 S 1 0.2 0
8 S 2 0.2 0
9 S 3 None 1
10 S 3 None 1
11 S 3 None 1
我试过了:
df=df\
.assign(SUM_pre = lambda x: x['value1'].rolling(x.span_pre).sum())
但输出是: ValueError: window 必须是整数 这是连贯的: x.span_pre 是一个系列。 你能帮我理解如何在不做太多循环的情况下每次提取 span_pre 列中的值吗? 所以我的最终输出应该是:
ID span_pre value1 value2 SUM_pre
0 A 0 0.1 0 0.1
1 A 1 0.0 0 0.1
2 A 2 0.1 0 0.2
3 A 3 0.1 0 0.2
4 A 3 0.1 0 0.3
5 A 3 0.1 0 0.3
6 S 0 0.2 0 0.2
7 S 1 0.2 0 0.4
8 S 2 0.2 0 0.6
9 S 3 None 1 0.4
10 S 3 None 1 0.2
11 S 3 None 1 None
答案 0 :(得分:0)
(请先检查您的输出)
我稍微修改了您的数据框以避免错误和混淆:
df['value1'] = df['value1'].replace({'None': np.NaN}).astype(float)
df.index += 100
>>> df
ID span_pre value1 value2
100 A 0 0.1 0 # [0:1] -> 0.0
101 A 1 0.0 0 # [0:2] -> 0.1 + 0.0
102 A 2 0.1 0 # [0:3] -> 0.1 + 0.0 + 0.1
103 A 3 0.1 0 # [0:4] -> 0.1 + 0.0 + 0.1 + 0.1
104 A 3 0.1 0 # [1:5] -> 0.0 + 0.1 + 0.1 + 0.1
105 A 3 0.1 0 # [2:6] -> 0.1 + 0.1 + 0.1 + 0.1
106 S 0 0.2 0 # [6:7] -> 0.2
107 S 1 0.2 0 # [6:8] -> 0.2 + 0.2
108 S 2 0.2 0 # [6:9] -> 0.2 + 0.2 + 0.2
109 S 3 NaN 1 # [6:10] -> 0.2 + 0.2 + 0.2 + NaN
110 S 3 NaN 1 # [7:11] -> 0.2 + 0.2 + NaN + NaN
111 S 3 NaN 1 # [8:12] -> 0.2 + NaN + NaN + NaN
您可以遍历每一行并对数据帧 [curr_idx-span_pre:curr_idx+1]
进行切片:
df['SUM_pre'] = [df['value1'].iloc[i-w:i+1].sum()
for i, w in enumerate(df['span_pre'])]
>>> df
ID span_pre value1 value2 SUM_pre
100 A 0 0.1 0 0.1
101 A 1 0.0 0 0.1
102 A 2 0.1 0 0.2
103 A 3 0.1 0 0.3
104 A 3 0.1 0 0.3
105 A 3 0.1 0 0.4
106 S 0 0.2 0 0.2
107 S 1 0.2 0 0.4
108 S 2 0.2 0 0.6
109 S 3 NaN 1 0.6
110 S 3 NaN 1 0.4
111 S 3 NaN 1 0.2
或者通过子类化 BaseIndexer
来使用 custom window rolling:
from pandas.api.indexers import BaseIndexer
class CustomIndexer(BaseIndexer):
def get_window_bounds(self, num_values, min_periods, center, closed):
right = 1 + np.arange(num_values, dtype='int64')
left = right - self.index_array.values - 1
return left, right
indexer = CustomIndexer(df['span_pre'])
df['SUM_pre'] = df['value1'].rolling(indexer).sum()