在迭代时间序列数据中的每一行时计算熊猫中当前行的总和

时间:2021-05-20 16:17:36

标签: python pandas cumsum

假设我有以下代码来计算在给定预算的情况下我可以购买多少产品-

import math
import pandas as pd

data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])

df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
    pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()

# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)

# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day

# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a 
# product, for future purchase. For example, if you observe 
# the resulting dataframe at the end of this code, you'll 
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was $100. That is because 
# on the days leading up to March 1, 2021, I have been saving 
# the spare change from previous product purchases and that 
# extra money allows me to buy an extra product on March 1, 2021 
# despite my budget of $100 should only allow me to purchase 
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0

# Suppose I save $100 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0

for index, row in df.iterrows():
    if df.loc[index, 'Day'] == 1:
        # num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
        df.loc[index, 'Num_of_Products_Bought'] = math.floor(
            (sum(df.iloc[0:(index + 1)]['Deposit'])
             - sum(df.iloc[0:(index + 1)]['Withdrawal']))
            / df.loc[index, 'Current_Price'])
        # Record how much I spent buying the product on specific date
        df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']

print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.

正如代码中的注释中所提到的,我想知道如何在不必逐行迭代并计算迭代中直到当前行的行的总和的情况下完成相同的操作(我阅读了StackOverflow 并看到 cumsum() 函数,但我认为 cumsum 在迭代中没有当前行的概念)。

非常感谢您的建议/回答!

0 个答案:

没有答案