在数据帧上执行嵌套3循环迭代的最有效方法

时间:2018-08-30 15:19:53

标签: python pandas loops dataframe nested

嗨,我有一个复杂的循环问题,因为我必须循环遍历交易数据的数据帧(每行一次交易)。贸易数据既可以与外部交易对手进行交易,也可以与内部“待处理”交易对手进行交易,我需要:

(a)找到三个相关交易并

(b)更改三笔交易中第一笔的执行时间以匹配第三笔交易的执行时间。

这是三笔交易的原因,是每笔交易都有一个执行时间和一个进入时间(进入时间是指该交易向下流入其他系统的时间,因此等于或晚于执行时间)。这听起来可能过于复杂,但这就是系统工作的方式。下面的代码示例中有更多详细信息。这有效但是很慢(文件中有数十万笔交易)。我的python很基本,所以我假设必须有一种更有效的方法来执行此操作,可能是.apply或其他方法?有人有建议吗?

在下面的评论之后,我已经清理了(抱歉)并创建了一个最小的工作示例(本来应该做到的)

import pandas as pd
import numpy as np
# TradeId - unique trade id
# ExecutionTime - time trade was executed
# EntryTime - time trade entered processing system (equal to or after Execution time)
# Counterparty - counterparty name including external counterparties and internal pending
# TraderName - eg Bob Smith 
# CcyPair - eg GBPUSD
# BaseTrade - notional of the trade in base currency eg 100 GBP
allTradesArrays = {
'TradeId':[101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],
'ExecutionDateTime':['06/26/2018 12:49:23','06/26/2018 13:20:12','06/26/2018 13:20:12','06/26/2018 13:20:32','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:51:12','06/26/2018 15:51:12','06/26/2018 15:54:10','06/26/2018 16:17:09','06/26/2018 18:54:06','06/26/2018 18:54:12','06/26/2018 18:54:12','06/26/2018 18:54:15','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'EntryDateTime':['06/26/2018 12:49:23','06/26/2018 13:25:13','06/26/2018 13:25:13','06/26/2018 13:20:33','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:52:01','06/26/2018 15:54:10','06/26/2018 15:54:10','06/26/2018 16:17:11','06/26/2018 18:54:07','06/26/2018 18:54:30','06/26/2018 19:58:25','06/26/2018 18:54:16','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'Counterparty':['cpty1','PENDING','cpty2','cpty12','cpty3','PENDING','cpty6','cpty2','PENDING','cpty8','PENDING','cpty9','cpty1','PENDING','cpty8','cpty3','cpty5','PENDING','cpty6','cpty2'],
'CcyPair':['GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD'],
'BaseTrade':[5,-6.94037287,-6.94037287,-11.63,-0.77222412,6.94037287,21.88,-5.2,10,10,-10,0.3931098,2.5,-670,-670,2.029,20,670,7.37097,11.056455]
}
allTrades = pd.DataFrame(allTradesArrays)
print()
print('allTrades.info()')
print(allTrades.info())
print()
print('allTrades.head()')
print(allTrades.head())
# Create a dataframe that is a subset of the trades dataframe that 
# contains only trades against the pending counterparty - note the
# pending trades are still in the allTrades df as I need to keep
# allTrades complete for further processing once the time stamp changes 
# are made by this looping
pendingTrades = allTrades[allTrades['Counterparty'] == 'PENDING']
print()
print('pendingTrades.info()')
print(pendingTrades.info())
print()
print('pendingTrades.head()')
print(pendingTrades.head())
# iterate over each trade in pendingTrades as explained below
for pendingTradeIndex1, pendingTrade1 in pendingTrades.iterrows():
    for allTradeIndex, allTrade in allTrades.iterrows():
        if (
            # if we find a trade in allTrades that is: 
            # not the same pendingTrade1 trade that is also in the allTrades 
            pendingTrade1['TradeId'] != allTrade['TradeId']
            # has the same CcyPair
            and pendingTrade1['CcyPair'] == allTrade['CcyPair']
            # has the same notional and sign
            and pendingTrade1['BaseTrade'] == allTrade['BaseTrade'] 
            # has matching execution datetimes
            and pendingTrade1['ExecutionDateTime'] == allTrade['ExecutionDateTime']
            # then we have found the first two of three trades
        ):
            # To find the third trade apply similar logic
            for pendingTradeIndex2, pendingTrade2 in pendingTrades.iterrows():
                if (
                    # If we find the second trade in allTrades that is:
                    # not the same pendingTrade2 trade that is also in the allTrades 
                    pendingTrade2['TradeId'] != allTrade['TradeId']
                    # is the same CcyPair as the devon trade
                    and pendingTrade2['CcyPair'] == allTrade['CcyPair']
                    # has the same notional but opposite sign as the devon trade
                    and pendingTrade2['BaseTrade'] == -1*allTrade['BaseTrade'] 
                    # has matching entry datetimes as the devon trade
                    and pendingTrade2['EntryDateTime'] == allTrade['EntryDateTime']
                    # does not have matching execution datetimes as the devon trade
                    and pendingTrade2['ExecutionDateTime'] != allTrade['ExecutionDateTime']
                    # then we should have found the third of three trades
                    # note this third trade is always a pending counterparty trade
                ):
                    print('pendingTrade1 ' + str(pendingTrade1['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'BaseTrade'].values[0]))
                    print('allTrade      ' + str(allTrade['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'BaseTrade'].values[0]))
                    print('pendingTrade2 ' + str(pendingTrade2['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'BaseTrade'].values[0]))
                    print('Changing ' + str(pendingTrade1['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' to ' + str(pendingTrade2['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]))
                    allTrades.loc[allTrades['TradeId'] == pendingTrade1['TradeId'],'ExecutionDateTime'] = \
                    allTrades.loc[allTrades['TradeId'] == pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]
                    print()

1 个答案:

答案 0 :(得分:0)

尝试itertuples()。它应该比iterrows()

快得多

Stack Overflow: Does iterrows have performance issues