我的数据框df看起来像这样,
a b
0 30.05 29.55
1 30.20 26.05
2 30.81 25.65
3 31.12 26.44
.. ... ...
85 30.84 25.65
86 31.12 26.44
87 29.55 25.57
88 32.41 25.45
89 21.55 29.57
90 32.91 26.41
91 34.12 25.69
我需要创建一个新的列' c'它包含一个列数组' b'值加上列' b'的前4行值。所以得到的df看起来像,
a b c
0 30.05 29.55 [29.55,0,0,0,0]
1 30.20 26.05 [26.05,29.55,0,0,0]
2 30.81 25.65 [25.65,26.05,29.55,0,0]
3 31.12 26.44 [26.44,25.65,26.05,29.55,0]
.. ... ...
85 30.84 25.65 [25.65, 44.60, 30.15, 29.55, 24.66 ]
86 31.12 26.44 [26.44, 25.65, 25.65, 25.65, 25.65 ]
87 29.55 25.57 [25.57, 26.44, 25.65, 25.65, 25.65 ]
88 32.41 25.45 [25.45, 25.57, 26.44, 25.65, 25.65 ]
89 21.55 29.57 [29.57, 25.45, 25.57, 26.44, 25.65 ]
90 32.91 26.41 [26.41, 29.57, 25.45, 25.57, 26.44 ]
91 34.12 25.69 [25.69, 26.41, 29.57, 25.45, 25.57 ]
我知道我可以使用df.b.shift(1)和df.b.shift(2)等访问以前的行但我希望能够轻松更改我回头看多少行以形成数组变量而不是输出多个shift(n)
看了一整天后,我被困住了。 (python3.6)
答案 0 :(得分:1)
您可以将pd.concat
与range(N)
In [60]: df['c'] = pd.concat([df.b.shift(i) for i in range(4)], 1).fillna(0).values.tolist()
In [61]: df
Out[61]:
a b c
0 30.05 29.55 [29.55, 0.0, 0.0, 0.0]
1 30.20 26.05 [26.05, 29.55, 0.0, 0.0]
2 30.81 25.65 [25.65, 26.05, 29.55, 0.0]
3 31.12 26.44 [26.44, 25.65, 26.05, 29.55]
85 30.84 25.65 [25.65, 26.44, 25.65, 26.05]
86 31.12 26.44 [26.44, 25.65, 26.44, 25.65]
87 29.55 25.57 [25.57, 26.44, 25.65, 26.44]
88 32.41 25.45 [25.45, 25.57, 26.44, 25.65]
89 21.55 29.57 [29.57, 25.45, 25.57, 26.44]
90 32.91 26.41 [26.41, 29.57, 25.45, 25.57]
91 34.12 25.69 [25.69, 26.41, 29.57, 25.45]
或,在np.column_stack
shift(n)
In [70]: np.column_stack([df.b.shift(i).fillna(0) for i in range(4)]).tolist()
Out[70]:
[[29.55, 0.0, 0.0, 0.0],
[26.05, 29.55, 0.0, 0.0],
[25.65, 26.05, 29.55, 0.0],
[26.44, 25.65, 26.05, 29.55],
[25.65, 26.44, 25.65, 26.05],
[26.44, 25.65, 26.44, 25.65],
[25.57, 26.44, 25.65, 26.44],
[25.45, 25.57, 26.44, 25.65],
[29.57, 25.45, 25.57, 26.44],
[26.41, 29.57, 25.45, 25.57],
[25.69, 26.41, 29.57, 25.45]]
答案 1 :(得分:0)
您可以使用条件列表理解(以检查回顾何时在索引中的第一个值之前)。
rows_lookback = 5
df = df.assign(c=[[df['b'].iat[n - i] if n - i >= 0 else 0
for i in range(rows_lookback)]
for n in range(len(df['b']))])
>>> df
a b c
0 30.05 29.55 [29.55, 0, 0, 0, 0]
1 30.20 26.05 [26.05, 29.55, 0, 0, 0]
2 30.81 25.65 [25.65, 26.05, 29.55, 0, 0]
3 31.12 26.44 [26.44, 25.65, 26.05, 29.55, 0]
85 30.84 25.65 [25.65, 26.44, 25.65, 26.05, 29.55]
86 31.12 26.44 [26.44, 25.65, 26.44, 25.65, 26.05]
87 29.55 25.57 [25.57, 26.44, 25.65, 26.44, 25.65]
88 32.41 25.45 [25.45, 25.57, 26.44, 25.65, 26.44]
89 21.55 29.57 [29.57, 25.45, 25.57, 26.44, 25.65]
90 32.91 26.41 [26.41, 29.57, 25.45, 25.57, 26.44]
91 34.12 25.69 [25.69, 26.41, 29.57, 25.45, 25.57]