我有一个包含250,000+行的df。我有一些字段取决于t-1值。在excel中这样做是轻而易举的事,但不确定在熊猫中最有效的方法是什么。目前,我设置了t [0]的值,然后使用for循环进行其余操作,但这非常慢。有更快的方法吗?
任何帮助将不胜感激!
下面的代码
import pandas as pd
import numpy as np
import math
import datetime
from scipy.optimize import minimize
df = pd.DataFrame({
'Time': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'Price': [44, 100, 40, 110, 77, 109, 65, 93, 89, 73]})
# Create Empty Columns
df[['Qty', 'Buy', 'Sell', 'Cost', 'Rev']] = pd.DataFrame([[0.00, 0.00, 0.00, 0.00, 0.00]], index=df.index)
# Initial Values
buy_price = 50
sell_price = 100
# Set Values at Time 0
df.at[0, 'Qty'] = 0
df.at[0, 'Buy'] = np.where(df.at[0, 'Price'] < buy_price, min(30 - df.at[0, 'Qty'], 10), 0)
df.at[0, 'Sell'] = np.where(df.at[0, 'Price'] > sell_price, min(df.at[0, 'Qty'], 10), 0)
df.at[0, 'Cost'] = df.at[0, 'Buy'] * df.at[0, 'Price']
df.at[0, 'Rev'] = df.at[0, 'Sell'] * df.at[0, 'Price']
# Set Remaining Values
for t in range(1, len(df)):
df.at[t, 'Qty'] = df.at[t-1, 'Qty'] + df.at[t-1, 'Buy'] - df.at[t-1, 'Sell']
df.at[t, 'Buy'] = np.where(df.at[t, 'Price'] < buy_price, min(30 - df.at[t, 'Qty'], 10), 0)
df.at[t, 'Sell'] = np.where(df.at[t, 'Price'] > sell_price, min(df.at[t, 'Qty'], 10), 0)
df.at[t, 'Cost'] = df.at[t, 'Buy'] * df.at[t, 'Price']
df.at[t, 'Rev'] = df.at[t, 'Sell'] * df.at[t, 'Price']
我看过this的上一篇文章,这很相似,但是我认为cumsum()在这种情况下不起作用,因为所有3个主要字段(数量,购买,出售)都是相互关联的。
答案 0 :(得分:2)
pandas
数据帧不用于循环遍历行。我建议您花些时间全面了解它的用途和功能。同时,这应该可以帮助您满足您的需求(我是即时完成的,因此请告知是否存在编译错误):
df['Qty'] = df['Qty'].shift() + df['Buy'].shift() - df['Sell'].shift()
df['Buy'] = df.apply(lambda x: 0 if x['Price'] >= buy_price else min(30 - X['Qty'], 10))
df['Sell'] = df.apply(lambda x: 0 if x['Price'] <= sell_price else min(x['Qty'], 10))
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = dft['Sell'] * df['Price']
答案 1 :(得分:1)
使用cumsum
和np.where
代替apply
:
df["Buy"]= np.where(df["Price"]<50, np.where((30 - df["Qty"]) > 10, 10, 30 - df["Qty"]), 0)
df["Sell"] = np.where(df["Price"]>100, np.where(df["Qty"] > 10, df["Qty"], 10), 0)
df["Qty"] = (df["Buy"].shift()-df["Sell"].shift()).cumsum()
df['Cost'] = df['Buy'] * df['Price']
df['Rev'] = df['Sell'] * df['Price']
print (df)
#
Time Price Qty Buy Sell Cost Rev
0 0 44 NaN 10.0 0.0 440.0 0.0
1 1 100 10.0 0.0 0.0 0.0 0.0
2 2 40 10.0 10.0 0.0 400.0 0.0
3 3 110 20.0 0.0 10.0 0.0 1100.0
4 4 77 10.0 0.0 0.0 0.0 0.0
5 5 109 10.0 0.0 10.0 0.0 1090.0
6 6 65 0.0 0.0 0.0 0.0 0.0
7 7 93 0.0 0.0 0.0 0.0 0.0
8 8 89 0.0 0.0 0.0 0.0 0.0
9 9 73 0.0 0.0 0.0 0.0 0.0
答案 2 :(得分:1)
一种更简洁的方法是编写一个可以存储状态的谓词,然后调用一次apply函数。 定义谓词,如下所示
class Predicate():
def __init__(self):
self.buy_price = 50
self.sell_price = 100
self.prev_qty = 0
self.prev_buy_price = 0
self.prev_sell_price = 0
def __call__(self, x):
x.Qty = self.prev_qty + self.prev_buy_price - self.prev_sell_price
x.Buy = np.where(x.Price < buy_price, min(30 - x.Qty, 10), 0)
x.Sell = np.where(x.Price > sell_price, min(x.Qty, 10), 0)
x.Cost = x.Buy * x.Price
x.Rev = x.Sell * x.Price
self.prev_buy_price = x.Buy
self.prev_qty = x.Qty
self.prev_sell_price = x.Sell
return x
并将谓词应用为
p = Predicate()
df.apply(p, axis=1)
给出以下结果
Time Price Qty Buy Sell Cost Rev
0 0.0 44.0 10.0 10.0 0.0 440.0 0.0
1 1.0 100.0 20.0 0.0 0.0 0.0 0.0
2 2.0 40.0 20.0 10.0 0.0 400.0 0.0
3 3.0 110.0 30.0 0.0 10.0 0.0 1100.0
4 4.0 77.0 20.0 0.0 0.0 0.0 0.0
5 5.0 109.0 20.0 0.0 10.0 0.0 1090.0
6 6.0 65.0 10.0 0.0 0.0 0.0 0.0
7 7.0 93.0 10.0 0.0 0.0 0.0 0.0
8 8.0 89.0 10.0 0.0 0.0 0.0 0.0
9 9.0 73.0 10.0 0.0 0.0 0.0 0.0