如何加速pandas df.apply(可能与np.where或np_logical_and.reduce?)

时间:2017-11-01 16:13:44

标签: python pandas numpy dataframe

我希望加快在pandas数据框中生成新列的速度,每行执行myFunc()代码:

df = pd.DataFrame(data, columns=["EMA4","EMA4prior","EMA10","MACD"])

    def myFunc (self, row):
           if ((row.EMA4 > row.EMA10) and (row.EMA4prior < row.EMA10) and (row.MACD > 0)):
            return 0
           if ((row.EMA4 < row.EMA10) and (row.EMA4prior > row.EMA10) and (row.MACD < 0)):
            return 1
        return -1

self.df["position"] = self.df.apply(self.myFunc, axis=1) #apply this per each row

代码有效,但速度很慢。我尝试了以下方法来改进它,但语法中的某些内容似乎有误:

1.-直接使用numpy.where:

a=self.df["EMA4"].values
b=self.df["EMA4prior"].values
c=self.df["EMA10"].values
d=self.df["MACD"].values
self.df["position"] = np.where(((a > c)&(b < c)&(e > 0)),0, 
                       (np.where((a < c)&(b > c)&(d < 0)), 1, -1)) 

2.-使用np.logical_and.reduce ,因为似乎np.logical_and是二元运算符(我有3“和”来计算):

self.df["position"] = np.where(np.logical_and.reduce([(a > c),(b < c),(e > 0)]),0,
                        (np.where(np.logical_and.reduce[(a < c),(b > c),(e < 0)]), 1, -1)) 

我没有让它工作,它没有编译,我不确定是什么问题。

那么,有没有办法用numpy或其他一些方法来提高原始myFunc()以提高性能?

1 个答案:

答案 0 :(得分:2)

IIUC,您应该可以使用np.select

cond = [(df.EMA4 > df.EMA10) & (df.EMA4prior < df.EMA10) & (df.MACD > 0), 
        (df.EMA4 < df.EMA10) & (df.EMA4prior > df.EMA10) & (df.MACD < 0)]
result = [0,1]

df['position'] = np.select(cond, result, -1)