我尝试通过迭代对名为df1的数据帧运行mibian.BS函数,并将值赋给名为' Implied_Vola'的新列。 如何加快整个程序?处理具有3 Mio行的原始数据帧将占用我的机器9000分钟,这是太多了。 不幸的是mibian.BS没有采取vektor输入。因此必须对数据帧中的每一行进行迭代应用。
import mibian
import numpy
import time
mask=(df1['ask'] > 0) & (df1['bid'] > 0) & (df1['call put'] == 'C') & (df1['Restlaufzeit']>0)
for index, row in df1.loc[mask].iterrows() :
try:
c = mibian.BS([row['unadjusted stock price'],row['strike'], row['Zins'], row['Restlaufzeit']], callPrice=row['mean'])
mask2=((df1.index==index) & (df1['unadjusted stock price']==row['unadjusted stock price']) & (df1['strike']==row['strike']) & (df1['Zins']==row['Zins']) & (df1['Restlaufzeit']==row['Restlaufzeit']) & (df1['mean']==row['mean'] ))
df1.loc[mask2, 'Implied_Vola'] = c.impliedVolatility
except ZeroDivisionError, e:
df1.loc[mask2,'Implied_Vola'] = numpy.nan
端=了time.time() 时间=(端开始)/ 60 打印时间,'分钟'
df1.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2 entries, 2002-05-16 00:00:00 to 2002-05-16 00:00:00
Data columns (total 13 columns):
adjusted stock close price 2 non-null float64
expiration 2 non-null datetime64[ns]
strike 2 non-null int64
call put 2 non-null object
ask 2 non-null float64
bid 2 non-null float64
volume 2 non-null int64
open interest 2 non-null int64
unadjusted stock price 2 non-null float64
Restlaufzeit 2 non-null int32
Zins 2 non-null float64
mean 2 non-null float64
Implied_Vola 2 non-null float64
dtypes: datetime64[ns](1), float64(7), int32(1), int64(3), object(1)
memory usage: 216.0+ bytes
我重写了没有dataframe.iterrows()的循环:
import mibian
import numpy
import time
df2=df1.copy()
start = time.time()
mask=(df2['ask'] > 0) & (df2['bid'] > 0) & (df2['call put'] == 'C') & (df2['Restlaufzeit']>0)
vola=[]
for row in df2.loc[mask].values:
try:
c = mibian.BS([row[8],row[2], row[10], row[9]], callPrice=row[11])
vola.append(c.impliedVolatility)
except ZeroDivisionError, e:
vola.append(numpy.nan)
df2.loc[mask,'vola'] = vola
end=time.time()
time=(end-start)/60
print time, 'minutes'
然而,没有加速。这应该以某种方式完成不同吗?
答案 0 :(得分:1)
循环遍历ndarray比使用df.iterrows()要快得多。
而不是
for index, row in df1.loc[mask].iterrows() :
# DO STUFF with row Series
尝试使用
for index, row in enumerate(df1.loc[mask].values) :
# DO STUFF with row tuple
你必须回到整数索引,但它要快得多。