从熊猫系列中提取特定值

时间:2021-06-29 08:48:24

标签: python pandas lambda series

我得到了这个代码:

 df=df\
    .assign(SUM_pre = lambda x: x['value1'].rolling(???).sum())

我想在特定滚动上对列 value1 的值求和。事实是滚动随时间变化,滚动的值保存在另一列中,如下所示:

       ID   span_pre   value1  value2
0      A       0        0.1       0
1      A       1        0.0       0
2      A       2        0.1       0
3      A       3        0.1       0
4      A       3        0.1       0
5      A       3        0.1       0
6      S       0        0.2       0
7      S       1        0.2       0
8      S       2        0.2       0
9      S       3       None       1
10     S       3       None       1
11     S       3       None       1

我试过了:

 df=df\
    .assign(SUM_pre = lambda x: x['value1'].rolling(x.span_pre).sum())

但输出是: ValueError: window 必须是整数 这是连贯的: x.span_pre 是一个系列。 你能帮我理解如何在不做太多循环的情况下每次提取 span_pre 列中的值吗? 所以我的最终输出应该是:

       ID   span_pre   value1  value2  SUM_pre
0      A       0        0.1       0      0.1
1      A       1        0.0       0      0.1
2      A       2        0.1       0      0.2
3      A       3        0.1       0      0.2
4      A       3        0.1       0      0.3
5      A       3        0.1       0      0.3
6      S       0        0.2       0      0.2
7      S       1        0.2       0      0.4
8      S       2        0.2       0      0.6
9      S       3       None       1      0.4
10     S       3       None       1      0.2
11     S       3       None       1     None

1 个答案:

答案 0 :(得分:0)

(请先检查您的输出)

我稍微修改了您的数据框以避免错误和混淆:

df['value1'] = df['value1'].replace({'None': np.NaN}).astype(float)
df.index += 100
>>> df
    ID  span_pre  value1  value2
100  A         0     0.1       0  # [0:1]  -> 0.0
101  A         1     0.0       0  # [0:2]  -> 0.1 + 0.0
102  A         2     0.1       0  # [0:3]  -> 0.1 + 0.0 + 0.1
103  A         3     0.1       0  # [0:4]  -> 0.1 + 0.0 + 0.1 + 0.1
104  A         3     0.1       0  # [1:5]  -> 0.0 + 0.1 + 0.1 + 0.1
105  A         3     0.1       0  # [2:6]  -> 0.1 + 0.1 + 0.1 + 0.1
106  S         0     0.2       0  # [6:7]  -> 0.2
107  S         1     0.2       0  # [6:8]  -> 0.2 + 0.2
108  S         2     0.2       0  # [6:9]  -> 0.2 + 0.2 + 0.2
109  S         3     NaN       1  # [6:10] -> 0.2 + 0.2 + 0.2 + NaN
110  S         3     NaN       1  # [7:11] -> 0.2 + 0.2 + NaN + NaN
111  S         3     NaN       1  # [8:12] -> 0.2 + NaN + NaN + NaN

您可以遍历每一行并对数据帧 [curr_idx-span_pre:curr_idx+1] 进行切片:

df['SUM_pre'] = [df['value1'].iloc[i-w:i+1].sum()
                     for i, w in enumerate(df['span_pre'])]
>>> df
    ID  span_pre  value1  value2  SUM_pre
100  A         0     0.1       0      0.1
101  A         1     0.0       0      0.1
102  A         2     0.1       0      0.2
103  A         3     0.1       0      0.3
104  A         3     0.1       0      0.3
105  A         3     0.1       0      0.4
106  S         0     0.2       0      0.2
107  S         1     0.2       0      0.4
108  S         2     0.2       0      0.6
109  S         3     NaN       1      0.6
110  S         3     NaN       1      0.4
111  S         3     NaN       1      0.2

或者通过子类化 BaseIndexer 来使用 custom window rolling

from pandas.api.indexers import BaseIndexer

class CustomIndexer(BaseIndexer):
    def get_window_bounds(self, num_values, min_periods, center, closed):
        right = 1 + np.arange(num_values, dtype='int64')
        left = right - self.index_array.values - 1
        return left, right

indexer = CustomIndexer(df['span_pre'])
df['SUM_pre'] = df['value1'].rolling(indexer).sum()