Question

我有一个包含250,000+行的df。我有一些字段取决于t-1值。在excel中这样做是轻而易举的事，但不确定在熊猫中最有效的方法是什么。目前，我设置了t [0]的值，然后使用for循环进行其余操作，但这非常慢。有更快的方法吗？

任何帮助将不胜感激！

下面的代码

import pandas as pd
import numpy as np
import math
import datetime
from scipy.optimize import minimize

df = pd.DataFrame({
    'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'Price': [44, 100, 40, 110, 77, 109, 65, 93, 89, 73]})

# Create Empty Columns
df[['Qty', 'Buy', 'Sell', 'Cost', 'Rev']] = pd.DataFrame([[0.00, 0.00, 0.00, 0.00, 0.00]], index=df.index)

# Initial Values
buy_price = 50
sell_price = 100

# Set Values at Time 0
df.at[0, 'Qty'] = 0
df.at[0, 'Buy'] = np.where(df.at[0, 'Price'] < buy_price, min(30 - df.at[0, 'Qty'], 10), 0)
df.at[0, 'Sell'] = np.where(df.at[0, 'Price'] > sell_price, min(df.at[0, 'Qty'], 10), 0)
df.at[0, 'Cost'] = df.at[0, 'Buy'] * df.at[0, 'Price']
df.at[0, 'Rev'] = df.at[0, 'Sell'] * df.at[0, 'Price']

# Set Remaining Values
for t in range(1, len(df)):
    df.at[t, 'Qty'] = df.at[t-1, 'Qty'] + df.at[t-1, 'Buy'] - df.at[t-1, 'Sell']
    df.at[t, 'Buy'] = np.where(df.at[t, 'Price'] < buy_price, min(30 - df.at[t, 'Qty'], 10), 0)
    df.at[t, 'Sell'] = np.where(df.at[t, 'Price'] > sell_price, min(df.at[t, 'Qty'], 10), 0)
    df.at[t, 'Cost'] = df.at[t, 'Buy'] * df.at[t, 'Price']
    df.at[t, 'Rev'] = df.at[t, 'Sell'] * df.at[t, 'Price']

我看过this的上一篇文章，这很相似，但是我认为cumsum（）在这种情况下不起作用，因为所有3个主要字段（数量，购买，出售）都是相互关联的。

Answer 1

pandas数据帧不用于循环遍历行。我建议您花些时间全面了解它的用途和功能。同时，这应该可以帮助您满足您的需求（我是即时完成的，因此请告知是否存在编译错误）：

df['Qty'] = df['Qty'].shift() + df['Buy'].shift() - df['Sell'].shift()
df['Buy'] = df.apply(lambda x: 0 if x['Price'] >= buy_price else min(30 - X['Qty'], 10))
df['Sell'] = df.apply(lambda x: 0 if x['Price'] <= sell_price else min(x['Qty'], 10))
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = dft['Sell'] * df['Price']

Answer 2

使用cumsum和np.where代替apply：

df["Buy"]= np.where(df["Price"]<50, np.where((30 - df["Qty"]) > 10, 10, 30 - df["Qty"]), 0)
df["Sell"] = np.where(df["Price"]>100, np.where(df["Qty"] > 10, df["Qty"], 10), 0)
df["Qty"] = (df["Buy"].shift()-df["Sell"].shift()).cumsum()
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = df['Sell'] * df['Price']

print (df)
#
   Time  Price   Qty   Buy  Sell   Cost     Rev
0     0     44   NaN  10.0   0.0  440.0     0.0
1     1    100  10.0   0.0   0.0    0.0     0.0
2     2     40  10.0  10.0   0.0  400.0     0.0
3     3    110  20.0   0.0  10.0    0.0  1100.0
4     4     77  10.0   0.0   0.0    0.0     0.0
5     5    109  10.0   0.0  10.0    0.0  1090.0
6     6     65   0.0   0.0   0.0    0.0     0.0
7     7     93   0.0   0.0   0.0    0.0     0.0
8     8     89   0.0   0.0   0.0    0.0     0.0
9     9     73   0.0   0.0   0.0    0.0     0.0

Answer 3

一种更简洁的方法是编写一个可以存储状态的谓词，然后调用一次apply函数。定义谓词，如下所示

class Predicate():
    def __init__(self):
        self.buy_price = 50
        self.sell_price = 100
        self.prev_qty = 0
        self.prev_buy_price = 0
        self.prev_sell_price = 0
    def __call__(self, x):
        x.Qty = self.prev_qty + self.prev_buy_price - self.prev_sell_price
        x.Buy = np.where(x.Price < buy_price, min(30 - x.Qty, 10), 0)
        x.Sell = np.where(x.Price > sell_price, min(x.Qty, 10), 0)
        x.Cost = x.Buy * x.Price
        x.Rev = x.Sell * x.Price
        self.prev_buy_price = x.Buy
        self.prev_qty = x.Qty
        self.prev_sell_price = x.Sell
        return x

并将谓词应用为

p = Predicate()
df.apply(p, axis=1)

给出以下结果

    Time    Price   Qty Buy Sell    Cost    Rev
0   0.0 44.0    10.0    10.0    0.0 440.0   0.0
1   1.0 100.0   20.0    0.0 0.0 0.0 0.0
2   2.0 40.0    20.0    10.0    0.0 400.0   0.0
3   3.0 110.0   30.0    0.0 10.0    0.0 1100.0
4   4.0 77.0    20.0    0.0 0.0 0.0 0.0
5   5.0 109.0   20.0    0.0 10.0    0.0 1090.0
6   6.0 65.0    10.0    0.0 0.0 0.0 0.0
7   7.0 93.0    10.0    0.0 0.0 0.0 0.0
8   8.0 89.0    10.0    0.0 0.0 0.0 0.0
9   9.0 73.0    10.0    0.0 0.0 0.0 0.0

在熊猫中进行t-1计算的更快方法

3 个答案: