我的DataFrame是:
Date Open High Low Close Adj Close Volume
5932 2016-08-18 218.339996 218.899994 218.210007 218.860001 207.483215 52989300
5933 2016-08-19 218.309998 218.750000 217.740005 218.539993 207.179825 75443000
5934 2016-08-22 218.259995 218.800003 217.830002 218.529999 207.170364 61368800
5935 2016-08-23 219.250000 219.600006 218.899994 218.970001 207.587479 53399200
5936 2016-08-24 218.800003 218.910004 217.360001 217.850006 206.525711 71728900
5937 2016-08-25 217.399994 218.190002 217.220001 217.699997 206.383514 69224800
5938 2016-08-26 217.919998 219.119995 216.250000 217.289993 205.994827 122506300
5939 2016-08-29 217.440002 218.669998 217.399994 218.360001 207.009201 68606100
5940 2016-08-30 218.259995 218.589996 217.350006 218.000000 206.667908 58114500
5941 2016-08-31 217.610001 217.750000 216.470001 217.380005 206.080124 85269500
5942 2016-09-01 217.369995 217.729996 216.029999 217.389999 206.089645 97844200
5943 2016-09-02 218.389999 218.869995 217.699997 218.369995 207.018692 79293900
5944 2016-09-06 218.699997 219.119995 217.860001 219.029999 207.644394 56702100
5945 2016-09-07 218.839996 219.220001 218.300003 219.009995 207.625412 76554900
5946 2016-09-08 218.619995 218.940002 218.149994 218.509995 207.151398 73011600
5947 2016-09-09 216.970001 217.029999 213.250000 213.279999 202.193268 221589100
5948 2016-09-12 212.389999 216.809998 212.309998 216.339996 205.094223 168110900
5949 2016-09-13 214.839996 215.149994 212.500000 213.229996 202.145859 182828800
5950 2016-09-14 213.289993 214.699997 212.500000 213.149994 202.070023 134185500
5951 2016-09-15 212.960007 215.729996 212.750000 215.279999 204.089294 134427900
5952 2016-09-16 213.479996 213.690002 212.570007 213.369995 203.300430 155236400
目前,我正在这样做:
state['open_price'] = lookback.Open.iloc[-1:].get_values()[0]
for ind, row in lookback.reset_index().iterrows():
if ind < self.LOOKBACK_DAYS:
state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close
state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open
state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume
但这太慢了。还有其他矢量化方法可以做到这一点吗?
我正在尝试将其转换为:
cash 1.000000e+05
num_shares 0.000000e+00
cost_basis 0.000000e+00
open_price 1.316900e+02
close_20 1.301100e+02
open_20 1.302600e+02
volume_20 4.670420e+07
close_19 1.302100e+02
open_19 1.299900e+02
volume_19 4.320920e+07
close_18 1.300200e+02
open_18 1.300300e+02
volume_18 3.252300e+07
close_17 1.292200e+02
open_17 1.299300e+02
volume_17 8.207990e+07
close_16 1.300300e+02
open_16 1.294100e+02
volume_16 6.150570e+07
close_15 1.298000e+02
open_15 1.301100e+02
volume_15 7.057170e+07
close_14 1.298300e+02
open_14 1.300200e+02
volume_14 6.292560e+07
close_13 1.297300e+02
open_13 1.300700e+02
volume_13 6.162470e+07
close_12 1.305600e+02
open_12 1.297300e+02
...
close_10 1.308700e+02
open_10 1.308500e+02
volume_10 5.790620e+07
close_9 1.295400e+02
open_9 1.310600e+02
volume_9 8.018090e+07
close_8 1.297400e+02
open_8 1.297400e+02
volume_8 4.149650e+07
close_7 1.286400e+02
open_7 1.298500e+02
volume_7 7.279940e+07
close_6 1.288800e+02
open_6 1.287700e+02
volume_6 4.303370e+07
close_5 1.287100e+02
open_5 1.285900e+02
volume_5 5.105180e+07
close_4 1.286600e+02
open_4 1.288300e+02
volume_4 6.416770e+07
close_3 1.307000e+02
open_3 1.289300e+02
volume_3 9.253180e+07
close_2 1.309500e+02
open_2 1.307500e+02
volume_2 8.726900e+07
close_1 1.311300e+02
open_1 1.310000e+02
volume_1 8.600550e+07
Length: 64, dtype: float64
答案 0 :(得分:1)
一种方法是使用.values
我还将添加一些用于创建等效示例的步骤:
import pandas as pd
from itertools import product
initial = ['cash', 'num_shares', 'somethingsomething']
initial_series = pd.Series([1, 2, 3], index = initial)
print(initial_series)
#Output:
cash 1
num_shares 2
somethingsomething 3
dtype: int64
好吧,在示例中,只是模拟了您的系列开始时的一些值。
df = pd.read_clipboard(sep='\s\s+') #pure magic
print(df.head())
#Output:
Date Open ... Adj Close Volume
5932 2016-08-18 218.339996 ... 207.483215 52989300
5933 2016-08-19 218.309998 ... 207.179825 75443000
5934 2016-08-22 218.259995 ... 207.170364 61368800
5935 2016-08-23 219.250000 ... 207.587479 53399200
5936 2016-08-24 218.800003 ... 206.525711 71728900
[5 rows x 7 columns]
现在,df实际上是您在示例中提供的数据框。剪贴板技巧来自here,是熊猫MCVE的不错阅读。
to_select = ['Close', 'Open', 'Volume']
SOMELOOKBACK = 6000 #mocked
final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]
这将准备索引,看起来像这样
['Close_68',
'Open_68',
'Volume_68',
'Close_67',
'Open_67',
'Volume_67',
...
]
现在,只需从数据框中选择相关的列,使用.values
获取2d数组,然后展平以获取最终的序列。
final_series = pd.Series(df[to_select].values.flatten(), index = final_index)
result = initial_series.append(final_series)
#Output:
cash 1.000000e+00
num_shares 2.000000e+00
somethingsomething 3.000000e+00
Close_68 2.188600e+02
Open_68 2.183400e+02
Volume_68 5.298930e+07
Close_67 2.185400e+02
Open_67 2.183100e+02
Volume_67 7.544300e+07
Close_66 2.185300e+02
Open_66 2.182600e+02
Volume_66 6.136880e+07
...
Close_48 2.133700e+02
Open_48 2.134800e+02
Volume_48 1.552364e+08
Length: 66, dtype: float64