以下是示例代码。
df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
df['C'] = df.B.rolling(window=3)
输出:
A B C
0 -0.108897 1.877987 Rolling [window=3,center=False,axis=0]
1 -1.276055 -0.424382 Rolling [window=3,center=False,axis=0]
2 1.578561 -1.094649 Rolling [window=3,center=False,axis=0]
3 -0.443294 1.683261 Rolling [window=3,center=False,axis=0]
4 0.674124 0.281077 Rolling [window=3,center=False,axis=0]
5 0.587773 0.697557 Rolling [window=3,center=False,axis=0]
6 -0.258038 -1.230902 Rolling [window=3,center=False,axis=0]
7 -0.443269 0.647107 Rolling [window=3,center=False,axis=0]
8 0.347187 0.753585 Rolling [window=3,center=False,axis=0]
9 -0.369179 0.975155 Rolling [window=3,center=False,axis=0]
我希望我的'C'列是一个像[0.1231,-1.132,0.8766]这样的数组。 我尝试使用滚动申请,但徒劳无功。
预期产出:
A B C
0 -0.108897 1.877987 []
1 -1.276055 -0.424382 []
2 1.578561 -1.094649 [-1.094649, -0.424382, 1.877987]
3 -0.443294 1.683261 [1.683261, -1.094649, -0.424382]
4 0.674124 0.281077 [0.281077, 1.683261, -1.094649]
5 0.587773 0.697557 [0.697557, 0.281077, 1.683261]
6 -0.258038 -1.230902 [-1.230902, 0.697557, 0.281077]
7 -0.443269 0.647107 [0.647107, -1.230902, 0.697557]
8 0.347187 0.753585 [0.753585, 0.647107, -1.230902]
9 -0.369179 0.975155 [0.975155, 0.753585, 0.647107]
答案 0 :(得分:6)
您可以使用np.stride_tricks
:
import numpy as np
as_strided = np.lib.stride_tricks.as_strided
df
A B
0 -0.272824 -1.606357
1 -0.350643 0.000510
2 0.247222 1.627117
3 -1.601180 0.550903
4 0.803039 -1.231291
5 -0.536713 -0.313384
6 -0.840931 -0.675352
7 -0.930186 -0.189356
8 0.151349 0.522533
9 -0.046146 0.507406
win = 3 # window size
# https://stackoverflow.com/a/47483615/4909087
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))
v
array([[ -1.60635669e+00, 5.10129842e-04, 1.62711678e+00],
[ 5.10129842e-04, 1.62711678e+00, 5.50902812e-01],
[ 1.62711678e+00, 5.50902812e-01, -1.23129111e+00],
[ 5.50902812e-01, -1.23129111e+00, -3.13383794e-01],
[ -1.23129111e+00, -3.13383794e-01, -6.75352179e-01],
[ -3.13383794e-01, -6.75352179e-01, -1.89356194e-01],
[ -6.75352179e-01, -1.89356194e-01, 5.22532550e-01],
[ -1.89356194e-01, 5.22532550e-01, 5.07405549e-01]])
df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
df
A B C
0 -0.272824 -1.606357 NaN
1 -0.350643 0.000510 NaN
2 0.247222 1.627117 [-1.606356691642917, 0.0005101298424200881, 1....
3 -1.601180 0.550903 [0.0005101298424200881, 1.6271167809032248, 0....
4 0.803039 -1.231291 [1.6271167809032248, 0.5509028122535129, -1.23...
5 -0.536713 -0.313384 [0.5509028122535129, -1.2312911105674484, -0.3...
6 -0.840931 -0.675352 [-1.2312911105674484, -0.3133837943758246, -0....
7 -0.930186 -0.189356 [-0.3133837943758246, -0.6753521794378446, -0....
8 0.151349 0.522533 [-0.6753521794378446, -0.18935619377656243, 0....
9 -0.046146 0.507406 [-0.18935619377656243, 0.52253255045267, 0.507...
答案 1 :(得分:2)
也许拉链对你的情况也有帮助,即
def get_list(x,m) : return list(zip(*(x[i:] for i in range(m))))
# get_list(df['B'],3) would return
[(-1.606357, 0.0005099999999999999, 1.627117),
(0.0005099999999999999, 1.627117, 0.5509029999999999),
(1.627117, 0.5509029999999999, -1.231291),
(0.5509029999999999, -1.231291, -0.313384),
(-1.231291, -0.313384, -0.6753520000000001),
(-0.313384, -0.6753520000000001, -0.189356),
(-0.6753520000000001, -0.189356, 0.522533),
(-0.189356, 0.522533, 0.507406)]
df['C'] = pd.Series(get_list(df['B'],3), index=df.index[3 - 1:])
# Little help form @coldspeed
print(df)
A B C
0 -0.272824 -1.606357 NaN
1 -0.350643 0.000510 NaN
2 0.247222 1.627117 (-1.606357, 0.0005099999999999999, 1.627117)
3 -1.601180 0.550903 (0.0005099999999999999, 1.627117, 0.5509029999...
4 0.803039 -1.231291 (1.627117, 0.5509029999999999, -1.231291)
5 -0.536713 -0.313384 (0.5509029999999999, -1.231291, -0.313384)
6 -0.840931 -0.675352 (-1.231291, -0.313384, -0.6753520000000001)
7 -0.930186 -0.189356 (-0.313384, -0.6753520000000001, -0.189356)
8 0.151349 0.522533 (-0.6753520000000001, -0.189356, 0.522533)
9 -0.046146 0.507406 (-0.189356, 0.522533, 0.507406)
答案 2 :(得分:2)
由于熊猫1.1
的滚动对象是可迭代的,因此您可以执行以下操作:
df['C'] = list(df.B.rolling(window=3))
或者,如果您想拥有列表,可以这样做:
df['C'] = [window.to_list() for window in df.B.rolling(window=3)]
这是简短,您可以使用rolling
函数的所有便捷参数。
答案 3 :(得分:0)
让我们通过滚动应用技巧来使用这种熊猫方法:
df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
list_of_values = []
df.B.rolling(3).apply(lambda x: list_of_values.append(x.values) or 0, raw=False)
df.loc[2:,'C'] = pd.Series(list_of_values).values
df
输出:
A B C
0 1.610085 0.354823 NaN
1 -0.241446 -0.304952 NaN
2 0.524812 -0.240972 [0.35482336179318674, -0.30495156795594963, -0.24097191924555197]
3 0.767354 0.281625 [-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]
4 -0.349844 -0.533781 [-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]
5 -0.174189 0.133795 [0.2816249674055174, -0.5337811449574766, 0.13379518286397707]
6 2.799437 -0.978349 [-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]
7 0.250129 0.289782 [0.13379518286397707, -0.9783488211443795, 0.2897823417165459]
8 -0.385259 -0.286399 [-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]
9 -0.755363 -1.010891 [0.2897823417165459, -0.28639931887491943, -1.0108913605575793]
答案 4 :(得分:0)
在较新的 numpy 版本中,有一个 sliding_window_view()
。
它提供与 as_strided()
数组相同的数组,但语法更透明。
import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view
x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
sliding_window_view(x, 3)
>>>
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
但请注意,pandas 滚动会在开始时添加一些 nans (window_size - 1),因为它使用了填充。你可以这样检查:
x.rolling(3).sum()
>>>
0 NaN
1 NaN
2 6.0
3 9.0
4 12.0
5 15.0
6 18.0
7 21.0
8 24.0
dtype: float64
sliding_window_view(x, 3).sum(axis=1)
>>>
array([ 6, 9, 12, 15, 18, 21, 24])
所以真正对应的数组应该是:
c = np.array([[nan, nan, 1.],
[nan, 1., 2.],
[ 1., 2., 3.],
[ 2., 3., 4.],
[ 3., 4., 5.],
[ 4., 5., 6.],
[ 5., 6., 7.],
[ 6., 7., 8.],
[ 7., 8., 9.]])
c.sum(axis=1)
>>>
array([nan, nan, 6., 9., 12., 15., 18., 21., 24.])
答案 5 :(得分:0)
这是另一种方式:
df.join(pd.concat(df['B'].rolling(window=3),axis=1).apply(lambda x: x.dropna().tolist()).reset_index(drop=True).loc[2:].rename('C'))