我希望加快在pandas数据框中生成新列的速度,每行执行myFunc()代码:
df = pd.DataFrame(data, columns=["EMA4","EMA4prior","EMA10","MACD"])
def myFunc (self, row):
if ((row.EMA4 > row.EMA10) and (row.EMA4prior < row.EMA10) and (row.MACD > 0)):
return 0
if ((row.EMA4 < row.EMA10) and (row.EMA4prior > row.EMA10) and (row.MACD < 0)):
return 1
return -1
self.df["position"] = self.df.apply(self.myFunc, axis=1) #apply this per each row
代码有效,但速度很慢。我尝试了以下方法来改进它,但语法中的某些内容似乎有误:
1.-直接使用numpy.where:
a=self.df["EMA4"].values
b=self.df["EMA4prior"].values
c=self.df["EMA10"].values
d=self.df["MACD"].values
self.df["position"] = np.where(((a > c)&(b < c)&(e > 0)),0,
(np.where((a < c)&(b > c)&(d < 0)), 1, -1))
2.-使用np.logical_and.reduce ,因为似乎np.logical_and是二元运算符(我有3“和”来计算):
self.df["position"] = np.where(np.logical_and.reduce([(a > c),(b < c),(e > 0)]),0,
(np.where(np.logical_and.reduce[(a < c),(b > c),(e < 0)]), 1, -1))
我没有让它工作,它没有编译,我不确定是什么问题。
那么,有没有办法用numpy或其他一些方法来提高原始myFunc()以提高性能?
答案 0 :(得分:2)
IIUC,您应该可以使用np.select
:
cond = [(df.EMA4 > df.EMA10) & (df.EMA4prior < df.EMA10) & (df.MACD > 0),
(df.EMA4 < df.EMA10) & (df.EMA4prior > df.EMA10) & (df.MACD < 0)]
result = [0,1]
df['position'] = np.select(cond, result, -1)