Question

我必须关注pandas dataframe：

a

1.0
1.5
1.3
1.2
1.9
0.8

然后我想将新的自定义函数应用于此列，该列具有window参数，我的意思是，它只需要从起点处理n个项目：

def hislack(x, window):
   # I only want to work with the last n items
   x = x[:-window,]
   # and do some stuff (this is a nosense example, just a simple sum)
   r = np.sum(x)
   return r

所以要将此函数应用到名为b的新列中，我使用了这个：

df['b'] = hislack(df['a'].values, 3)

但它返回以下内容：

a     b

1.0   3.9
1.5   3.9
1.3   3.9
1.2   3.9
1.9   3.9
0.8   3.9

这是最后一行的结果：0.8 + 1.9 + 1.2 = 3.9

所以预期的输出是：

a     b

1.0   Nan
1.5   Nan
1.3   3.8
1.2   4.0
1.9   4.4
0.8   3.9

如何防止对所有行应用相同的公式结果？

Answer 1

您需要DataFrame.rolling：

df['a'].rolling(3).sum()       # here 3 is the window parameter for your function and sum
                               # is the function/operation you want to apply to each window
#0    NaN
#1    NaN
#2    3.8
#3    4.0
#4    4.4
#5    3.9
#Name: a, dtype: float64

或者：

df['a'].rolling(3).apply(sum)

更一般地说，您可以：df['a'].rolling(window).apply(fun)将window参数传递给rolling，将函数传递给apply。

如何在pandas数据框中应用带有window参数的自定义函数？

1 个答案: