如何使用Pandas或Numpy优化大型循环

时间:2019-04-18 08:08:01

标签: python pandas numpy optimization

我正在研究将pandas.DataFrame个贸易交易作为输入,对开和关交易进行配对并计算指标的代码。该代码将每个事务扩展为一个大列表。这是将交易量乘以特定于工具的乘数的结果。对于每种交易的工具(可能为1_000s),每笔交易(可能为1_000s)都会发生此操作。代码循环该列表以计算指标。

我需要优化此函数中的循环以提高性能。我尝试使用pandas并使用numpy进行矢量化处理,但无法正确执行逻辑或提高性能。由于循环中的条件语句和price_stack deque的突变,我被赶上了。

def round_trips(transactions):
    roundtrips = []

    # group all transactions by symbol and iterate through each group
    for sym, trans_sym in transactions.groupby("symbol"):

        trans_sym = trans_sym.sort_index()
        price_stack = deque()
        dt_stack = deque()

        trans_sym["signed_price"] = trans_sym.trade_price * np.sign(trans_sym.quantity)
        trans_sym["abs_amount"] = trans_sym.quantity.abs().astype(int) * trans_sym.multiplier.astype(int)

        # for each transaction, extract date as dt and transaction details as t
        for dt, t in trans_sym.iterrows():

            # create a list of the signed price where len(...) == the trade amount(qty * multiplier)
            indiv_prices = [t.signed_price] * t.abs_amount

            # create a list of commission per unit where len(...) == the trade amount (qty * multiplier)
            indiv_commissions = [t.commission / t.abs_amount] * t.abs_amount

            # this is an opening trade
            if (len(price_stack) == 0) or (
                copysign(1, price_stack[-1]) == copysign(1, t.quantity)
            ):
                price_stack.extend(indiv_prices)
                dt_stack.extend([dt] * len(indiv_prices))
            else:

                # close round-trip
                gross_pnl = 0
                cur_open_dts = []

                # this could loop tens of millions of times
                for price, commission in zip(indiv_prices, indiv_commissions):
                    if len(price_stack) != 0 and (
                        copysign(1, price_stack[-1]) != copysign(1, price) or
                        abs(price) == 0
                    ):
                        prev_price = price_stack.popleft()
                        prev_dt = dt_stack.popleft()
                        gross_pnl += -(price + prev_price)
                        cur_open_dts.append(prev_dt)
                    else:
                        price_stack.append(price)
                        commission_stack.append(commission)
                        dt_stack.append(dt)

                roundtrips.append({
                    "gross_pnl": gross_pnl,
                    "open_dt": cur_open_dts[0],
                    "close_dt": dt,
                    "long": price < 0,
                    "symbol": sym,
                })

    return pd.DataFrame(roundtrips)

这是transactions输入DataFrame的示例:

        created_on                          updated_on                          id      user_id     instrument_id   amount      commission  trade_datetime              trade_key   proceeds    quantity    side    trade_price     margin_requirement  symbol  multiplier
146     2019-04-09 02:05:14.370164+00:00    2019-04-09 02:05:14.370191+00:00    148     100         70              44325.00    2.97        2018-04-17 09:51:35+00:00   1.1         44327.97    -1          sell    1.1820          -11081.2500         KCU18   37500.0
147     2019-04-09 02:05:14.390017+00:00    2019-04-09 02:05:14.390045+00:00    149     100         70              45731.25    2.97        2018-04-24 09:44:51+00:00   1.1         45734.22    -1          sell    1.2195          -11432.8125         KCU18   37500.0
148     2019-04-09 02:05:14.409739+00:00    2019-04-09 02:05:14.409767+00:00    150     100         70              -47793.75   2.97        2018-05-02 07:41:29+00:00   1.1         -47790.78   1           buy     1.2745          11948.4375          KCU18   37500.0
149     2019-04-09 02:05:14.432697+00:00    2019-04-09 02:05:14.432743+00:00    151     100         70              -47793.75   2.97        2018-05-02 07:41:29+00:00   1.1         -47790.78   1           buy     1.2745          11948.4375          KCU18   37500.0

0 个答案:

没有答案