我有以下数据框df1
:
Bank Rate_1Y%
Bank of America 2
Barclays 0.75
Nationalbanken 0.05
Deutsche Bank 0
UBS -0.75
我有以下数据框df2
:
0
2010-12-31 2010-12-31
2011-12-31 2011-12-31
2012-12-31 2012-12-31
2013-12-31 2013-12-31
2014-12-31 2014-12-31
2015-12-31 2015-12-31
2016-12-31 2016-12-31
2017-12-31 2017-12-31
2018-12-31 2018-12-31
2019-12-31 2019-12-31
我有一个输入值:
Input_Balance = 10000
Start_Date = '2010-01-01'
End_Date = '2020-01-01'
freq = '1Y'
我创建了带有时间列的新df2:
DatetimeIndex(['2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31',
'2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31',
'2018-12-31', '2019-12-31'],
dtype='datetime64[ns]
任何人都可以帮忙找到一个简单的函数解决方案,以计算End_Date - Start_Date
期间的Input_Balance变化。
我想在df2中有一个新列,该列代表自定义银行的期末余额计算,在这种情况下,我使用美国银行。
预期输出:
Date End Balance
2010-12-31 10200$
2011-12-31 10200$
2012-12-31 10200$
需要在选定期间(开始期间到结束期间)中为自定义库写下列期末余额
答案 0 :(得分:4)
如果我正确理解了OP的问题,并且df2
的每一行都应对应于时间t的当前余额,并给定Start_Date
的初始余额,那么我将采用这种方式:>
from datetime import datetime, timedelta
def compute_balance(input_balance,
prev_date,
end_date,
time_interval,
rate_by_bank,
data=None,
):
"""
Recursively compute balance at time t given yearly rate
:param input_balance: initial input balance (x0)
:param prev_date: datetime.datetime object specifying starting date
:param end_date: datetime.datetime object specifying ending date
:param time_interval: time interval in days
:param rate_by_bank: a dictionary providing change rate per bank {bank_name: rate, ...}
:param data: List of dictionary (must not be set by user)
:return pandas.DataFrame
"""
if data is None:
data = [{
'time': prev_date,
**{
bank_name: input_balance
for bank_name, _ in rate_by_bank
}
}]
nb_days_per_year = 365.0
normalized_time_interval = time_interval/nb_days_per_year
cur_date = prev_date + timedelta(days=time_interval)
if cur_date >= end_date:
return pd.DataFrame(data).set_index('time')
balance_per_bank = {
bank_name: (data[-1][bank_name]
+ (rate/100.0) * normalized_time_interval * data[-1][bank_name]
)
for bank_name, rate in rate_by_bank
}
data.append({
'time': cur_date,
**balance_per_bank
})
return compute_balance(input_balance, cur_date, end_date, time_interval, rates, data)
# Input variables
Input_Balance = 10000
Start_Date = '2010-01-01'
End_Date = '2020-01-01'
# convert df_1 to dictionary to get rate per bank
rates = df_1.to_dict(orient='split')['data']
# convert dates to datetime objects
start_date = pd.Timestamp(datetime.strptime(Start_Date, '%Y-%d-%m'))
end_date = pd.Timestamp(datetime.strptime(End_Date, '%Y-%d-%m'))
df_2 = compute_balance(Input_Balance, start_date, end_date, 365, rates)
然后应该输出:
Bank of America Barclays Deutsche Bank NationalBanken \
time
2010-01-01 10000.0000 10000.000000 10000.0 10000.000000
2011-01-01 10200.0000 10075.000000 10000.0 10005.000000
2012-01-01 10404.0000 10150.562500 10000.0 10010.002500
2012-12-31 10612.0800 10226.691719 10000.0 10015.007501
2013-12-31 10824.3216 10303.391907 10000.0 10020.015005
UBS
time
2010-01-01 10000.000000
2011-01-01 9925.000000
2012-01-01 9850.562500
2012-12-31 9776.683281
2013-12-31 9703.358157
答案 1 :(得分:1)
IIUC,您需要递归将利息添加到当前值吗?
我认为df将包含利率和银行,
,并且df2将具有开始日期。
然后我们可以做笛卡尔乘积来创建新的df,然后应用循环以对行进行处理。
# cartesian product.
df3 = (
df.assign(key=1)
.merge(df2.assign(key=1), on="key")
.drop("key", axis=1)
)
#Get indices of first instance of each bank. Assuming your data is ordered by datetime.
indices = df3.drop_duplicates(subset='Bank',keep='first').index.tolist()
# calculate the first interest value.
df3.loc[indices,'Value'] = value + (value * (df3['Rate_1Y%'] / 100))
# Calculate the rest of the data frame.
for i in range(1, len(df3)):
df3.loc[i, 'Value'] = df3.loc[i-1, 'Value'] + (df3.loc[i-1, 'Value'] * (df3.loc[i, 'Rate_1Y%'] / 100))
print(df3)
Bank Rate_1Y% Date Value
0 Bank of America 2.00 2010-12-31 10200.000000
1 Bank of America 2.00 2011-12-31 10404.000000
2 Bank of America 2.00 2012-12-31 10612.080000
3 Bank of America 2.00 2013-12-31 10824.321600
4 Bank of America 2.00 2014-12-31 11040.808032
5 Bank of America 2.00 2015-12-31 11261.624193
6 Bank of America 2.00 2016-12-31 11486.856676
7 Bank of America 2.00 2017-12-31 11716.593810
8 Bank of America 2.00 2018-12-31 11950.925686
9 Bank of America 2.00 2019-12-31 12189.944200
作为一项功能,随时可以根据需要更改编辑。
def calc_interest(dataframe_1, dataframe_2, col_name='Rate_1Y%'):
df3 = (
dataframe_1.assign(key=1)
.merge(dataframe_2.assign(key=1), on="key")
.drop("key", axis=1)
)
indices = df3.drop_duplicates(subset='Bank',keep='first').index.tolist()
df3.loc[indices,'Value'] = value + (value * (df3[col_name] / 100))
for i in range(1, len(df3)):
df3.loc[i, 'Value'] = df3.loc[i-1, 'Value'] + (df3.loc[i-1, 'Value'] * (df3.loc[i, 'Rate_1Y%'] / 100))
答案 2 :(得分:0)
如果您需要在df2
中创建新列,只需输入:
from datetime import datetime
import pandas as pd
df2.reset_index(name='Start_Date', inplace=True)
df2['End_Date'] = '2020-01-01' #or any required value
df2['Start_Date'] = pd.to_datetime(df2['Start_Date'])
df2['End_Date'] = pd.to_datetime(df2['End_Date'])
df2['Input_Balance'] = df2['End_Date']- df2['Start_Date']
如果您需要为自定义银行创建新列,则意味着银行名称也应在df2中。 groupby
与聚合一起使用的另一种方式。
最好有df1
,df2
的示例并鉴于df2
清除预期结果...