如何加速Python

时间:2016-06-27 09:03:13

标签: python performance optimization normalization acceleration

我有一个复杂的功能,它需要永远在大型熊猫df上运行,我找不到加速它的方法。你们有什么小费吗? 我使用过numba,但这显然还不够。我也尝试使用索引引用来最大限度地利用pandas容量,但我确信还有其他方法我没有实现。

这个函数的作用基本上是带有随机时间间隔事件的df,并将其标准化为第二个间隔事件。有三种不同类型的事件(TRADE,BEST_BID,BEST_ASK),因此每一秒我应该有三行(每个事件一个)。如果在该秒期间没有发生特定类型的事件,我们将重复使用先前的值。

感谢您的帮助!

@numba.jit
def convertTicksToSeconds(dataFrame_df):    
    idxs = dataFrame_df['data_all'][dataFrame_df['data_all']['time_change']].index.tolist()
    previous_idx = 0
    progress_i = 0
    #Creation of the df to holfd the normalized data
    normalized_Data = pandas.DataFrame( columns=['timestamp', 'B.A.T', 'price', 'volume', 'asset'])
    BAT_type =['TRADE','BEST_BID','BEST_ASK']
    tmp_time = dataFrame_df['data_all']['timestamp'][0]
    data_TRADE = {'timestamp': tmp_time, 'B.A.T': 'TRADE', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
    data_BID = {'timestamp': tmp_time, 'B.A.T': 'BEST_BID', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
    data_ASK = {'timestamp': tmp_time, 'B.A.T': 'BEST_ASK', 'price': 0, 'volume': 0, 'asset': dataFrame_df['data_all']['asset'][0]}
    for BAT in BAT_type:
        for idx in idxs:
            if dataFrame_df['data_all'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT].empty == False:
                timestamp = dataFrame_df['data_all']['timestamp'][idx]
                price = dataFrame_df['data_all']['price'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT]
                volume = dataFrame_df['data_all']['volume'][previous_idx:idx-1][dataFrame_df['data_all']['B.A.T'] == BAT]
                total_volume = volume.sum()
                weighted_price = price * volume
                weighted_price = weighted_price.sum() / total_volume
                volume = volume.mean()
                asset = dataFrame_df['data_all']['asset'][idx]

                if BAT == 'TRADE':
                    data_TRADE = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}

                elif BAT == 'BEST_BID':
                    data_BID = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}

                elif BAT == 'BEST_ASK':
                    data_ASK = {'timestamp': timestamp, 'B.A.T': BAT, 'price': weighted_price, 'volume': volume, 'asset': asset}

                print data_TRADE
                print data_BID
                print data_ASK
                normalized_Data.append(data_TRADE, ignore_index=True)
                normalized_Data.append(data_BID, ignore_index=True)
                normalized_Data.append(data_ASK, ignore_index=True)
                previous_idx = idx
                progress_i += 1
                tmp = (progress_i / len(idxs))*100
                print ('Progress : ' + str(tmp) + ' %')
    return normalized_Data

0 个答案:

没有答案