如何在没有for循环的情况下处理DataFrame?

时间:2019-04-09 20:13:32

标签: python pandas

我的DataFrame是:

            Date        Open        High         Low       Close   Adj Close     Volume
5932  2016-08-18  218.339996  218.899994  218.210007  218.860001  207.483215   52989300
5933  2016-08-19  218.309998  218.750000  217.740005  218.539993  207.179825   75443000
5934  2016-08-22  218.259995  218.800003  217.830002  218.529999  207.170364   61368800
5935  2016-08-23  219.250000  219.600006  218.899994  218.970001  207.587479   53399200
5936  2016-08-24  218.800003  218.910004  217.360001  217.850006  206.525711   71728900
5937  2016-08-25  217.399994  218.190002  217.220001  217.699997  206.383514   69224800
5938  2016-08-26  217.919998  219.119995  216.250000  217.289993  205.994827  122506300
5939  2016-08-29  217.440002  218.669998  217.399994  218.360001  207.009201   68606100
5940  2016-08-30  218.259995  218.589996  217.350006  218.000000  206.667908   58114500
5941  2016-08-31  217.610001  217.750000  216.470001  217.380005  206.080124   85269500
5942  2016-09-01  217.369995  217.729996  216.029999  217.389999  206.089645   97844200
5943  2016-09-02  218.389999  218.869995  217.699997  218.369995  207.018692   79293900
5944  2016-09-06  218.699997  219.119995  217.860001  219.029999  207.644394   56702100
5945  2016-09-07  218.839996  219.220001  218.300003  219.009995  207.625412   76554900
5946  2016-09-08  218.619995  218.940002  218.149994  218.509995  207.151398   73011600
5947  2016-09-09  216.970001  217.029999  213.250000  213.279999  202.193268  221589100
5948  2016-09-12  212.389999  216.809998  212.309998  216.339996  205.094223  168110900
5949  2016-09-13  214.839996  215.149994  212.500000  213.229996  202.145859  182828800
5950  2016-09-14  213.289993  214.699997  212.500000  213.149994  202.070023  134185500
5951  2016-09-15  212.960007  215.729996  212.750000  215.279999  204.089294  134427900
5952  2016-09-16  213.479996  213.690002  212.570007  213.369995  203.300430  155236400

目前,我正在这样做:

        state['open_price'] = lookback.Open.iloc[-1:].get_values()[0]

        for ind, row in lookback.reset_index().iterrows():
            if ind < self.LOOKBACK_DAYS:
                state['close_' + str(self.LOOKBACK_DAYS - ind)] = row.Close
                state['open_' + str(self.LOOKBACK_DAYS - ind)] = row.Open
                state['volume_' + str(self.LOOKBACK_DAYS - ind)] = row.Volume

但这太慢了。还有其他矢量化方法可以做到这一点吗?

我正在尝试将其转换为:

cash          1.000000e+05
num_shares    0.000000e+00
cost_basis    0.000000e+00
open_price    1.316900e+02
close_20      1.301100e+02
open_20       1.302600e+02
volume_20     4.670420e+07
close_19      1.302100e+02
open_19       1.299900e+02
volume_19     4.320920e+07
close_18      1.300200e+02
open_18       1.300300e+02
volume_18     3.252300e+07
close_17      1.292200e+02
open_17       1.299300e+02
volume_17     8.207990e+07
close_16      1.300300e+02
open_16       1.294100e+02
volume_16     6.150570e+07
close_15      1.298000e+02
open_15       1.301100e+02
volume_15     7.057170e+07
close_14      1.298300e+02
open_14       1.300200e+02
volume_14     6.292560e+07
close_13      1.297300e+02
open_13       1.300700e+02
volume_13     6.162470e+07
close_12      1.305600e+02
open_12       1.297300e+02
                  ...     
close_10      1.308700e+02
open_10       1.308500e+02
volume_10     5.790620e+07
close_9       1.295400e+02
open_9        1.310600e+02
volume_9      8.018090e+07
close_8       1.297400e+02
open_8        1.297400e+02
volume_8      4.149650e+07
close_7       1.286400e+02
open_7        1.298500e+02
volume_7      7.279940e+07
close_6       1.288800e+02
open_6        1.287700e+02
volume_6      4.303370e+07
close_5       1.287100e+02
open_5        1.285900e+02
volume_5      5.105180e+07
close_4       1.286600e+02
open_4        1.288300e+02
volume_4      6.416770e+07
close_3       1.307000e+02
open_3        1.289300e+02
volume_3      9.253180e+07
close_2       1.309500e+02
open_2        1.307500e+02
volume_2      8.726900e+07
close_1       1.311300e+02
open_1        1.310000e+02
volume_1      8.600550e+07
Length: 64, dtype: float64

1 个答案:

答案 0 :(得分:1)

一种方法是使用.values

作弊并使用基础数组

我还将添加一些用于创建等效示例的步骤:

import pandas as pd
from itertools import product

initial = ['cash', 'num_shares', 'somethingsomething']
initial_series = pd.Series([1, 2, 3], index = initial)
print(initial_series)
#Output:
cash                  1
num_shares            2
somethingsomething    3
dtype: int64

好吧,在示例中,只是模拟了您的系列开始时的一些值。

df = pd.read_clipboard(sep='\s\s+') #pure magic
print(df.head())
#Output:
            Date        Open    ...      Adj Close    Volume
5932  2016-08-18  218.339996    ...     207.483215  52989300
5933  2016-08-19  218.309998    ...     207.179825  75443000
5934  2016-08-22  218.259995    ...     207.170364  61368800
5935  2016-08-23  219.250000    ...     207.587479  53399200
5936  2016-08-24  218.800003    ...     206.525711  71728900

[5 rows x 7 columns]
现在,

df实际上是您在示例中提供的数据框。剪贴板技巧来自here,是熊猫MCVE的不错阅读。

to_select = ['Close', 'Open', 'Volume']
SOMELOOKBACK = 6000 #mocked
final_index = [f"{name}_{index}" for index, name in product((SOMELOOKBACK - df.index), to_select)]

这将准备索引,看起来像这样

['Close_68',
 'Open_68',
 'Volume_68',
 'Close_67',
 'Open_67',
 'Volume_67',
...
]

现在,只需从数据框中选择相关的列,使用.values获取2d数组,然后展平以获取最终的序列。

final_series = pd.Series(df[to_select].values.flatten(), index = final_index)

result = initial_series.append(final_series)
#Output:
cash                  1.000000e+00
num_shares            2.000000e+00
somethingsomething    3.000000e+00
Close_68              2.188600e+02
Open_68               2.183400e+02
Volume_68             5.298930e+07
Close_67              2.185400e+02
Open_67               2.183100e+02
Volume_67             7.544300e+07
Close_66              2.185300e+02
Open_66               2.182600e+02
Volume_66             6.136880e+07
...
Close_48              2.133700e+02
Open_48               2.134800e+02
Volume_48             1.552364e+08
Length: 66, dtype: float64