我目前有以下代码遍历数据帧的每一行,并将某个单元格的前一行值分配给另一个单元格的当前行。
基本上我正在做的是找出某个指标的“昨天”值与今天的比较。正如您所料,这是非常缓慢的(特别是因为我正在处理具有数十万行的数据帧)。
for index, row in symbol_df.iterrows():
if index != 0:
symbol_df.loc[index, 'yesterday_sma_20'] = symbol_df.loc[index-1]['sma_20']
symbol_df.loc[index, 'yesterday_roc_20'] = symbol_df.loc[index-1]['roc_20']
symbol_df.loc[index, 'yesterday_roc_100'] = symbol_df.loc[index-1]['roc_100']
symbol_df.loc[index, 'yesterday_atr_10'] = symbol_df.loc[index-1]['atr_10']
symbol_df.loc[index, 'yesterday_vsma_20'] = symbol_df.loc[index-1]['vsma_20']
有没有办法将其变成矢量化操作?或者真的只是加快速度而不必单独遍历每一行?
答案 0 :(得分:4)
我可能会忽视某些事情,但我认为使用.shift()
应该这样做。
import pandas as pd
df = pd.read_csv('test.csv')
print df
# Date SMA_20 ROC_20
# 0 7/22/2015 0.754889 0.807870
# 1 7/23/2015 0.376448 0.791365
# 2 7/22/2015 0.527232 0.407420
# 3 7/24/2015 0.616281 0.027188
# 4 7/22/2015 0.126556 0.274681
# 5 7/25/2015 0.570008 0.864057
# 6 7/22/2015 0.632057 0.746988
# 7 7/26/2015 0.373405 0.883944
# 8 7/22/2015 0.775591 0.453368
# 9 7/27/2015 0.678638 0.313374
df['y_SMA_20'] = df['SMA_20'].shift()
df['y_ROC_20'] = df['ROC_20'].shift()
print df
# Date SMA_20 ROC_20 y_SMA_20 y_ROC_20
# 0 7/22/2015 0.754889 0.807870 NaN NaN
# 1 7/23/2015 0.376448 0.791365 0.754889 0.807870
# 2 7/22/2015 0.527232 0.407420 0.376448 0.791365
# 3 7/24/2015 0.616281 0.027188 0.527232 0.407420
# 4 7/22/2015 0.126556 0.274681 0.616281 0.027188
# 5 7/25/2015 0.570008 0.864057 0.126556 0.274681
# 6 7/22/2015 0.632057 0.746988 0.570008 0.864057
# 7 7/26/2015 0.373405 0.883944 0.632057 0.746988
# 8 7/22/2015 0.775591 0.453368 0.373405 0.883944
# 9 7/27/2015 0.678638 0.313374 0.775591 0.453368