尝试iterrows()非常慢,在别处读取zip会更好,但它仍然很慢。
我尝试搜索数据帧的行,生成一些统计信息以填充两个新的数据帧。
是否有任何建议加快搜索数据帧的行?
代码段:
for index,date,stocknum in zip(stockpicks.index.values,stockpicks.date.values,stockpicks.stocknum.values):
stock=readStockPrice(stocknum)
if stock.empty:
return print("error - empty frame")
stock=stock.ix[trading_days]
stockprice=stock.Close.values
p0_date=trading_days.get_loc(date)
p0=stockprice[p0_date]
stock_pct_change={('d'+str(d)):stockprice[p0_date+d]/p0*100.0 if (p0_date+d)< len(trading_days) else np.nan for d in days }
b0=hsi[p0_date]
benchmark_pct_change={('d'+str(d)):hsi[p0_date+d]/b0*100.0 if (p0_date+d)< len(trading_days) else np.nan for d in days }
for d in days:
stock_analysis.loc[index,'d'+str(d)]=stock_pct_change['d'+str(d)]
benchmark_analysis.loc[index,'d'+str(d)]=benchmark_pct_change['d'+str(d)]
答案 0 :(得分:0)
您出现的问题可以完全矢量化。像你一样进行迭代和索引是最慢的方法。
In [6]: df = DataFrame(np.random.randint(-5,5,size=20).reshape(5,4),columns=list('abcd'),index=date_range('20130101',periods=5))+50.0
In [7]: df.pct_change()
Out[7]:
a b c d
2013-01-01 NaN NaN NaN NaN
2013-01-02 0.108696 0.108696 0.102041 0.086957
2013-01-03 -0.058824 -0.039216 -0.074074 -0.060000
2013-01-04 0.104167 0.081633 -0.020000 0.000000
2013-01-05 -0.075472 -0.113208 0.061224 -0.021277
[5 rows x 4 columns]