我有一个我正在尝试阅读的费用文件,并从此文件创建每日日志。多年来延伸的文件的一小部分如下所示,在2015年1月的几天内。
Date,Checking_Debit,Checking_Addition,Savings_Debit,Savings_Addition
2015-01-07,342.1,0.0,0.0,0.0
2015-01-07,981.0,0.0,0.0,0.0
2015-01-07,3185.0,0.0,0.0,0.0
2015-01-05,55.0,0.0,0.0,0.0
2015-01-05,75.0,0.0,0.0,0.0
2015-01-03,287.0,0.0,0.0,0.0
2015-01-02,64.8,0.0,0.0,0.0
2015-01-02,75.0,0.0,0.0,75.0
2015-01-02,1280.0,0.0,0.0,0.0
2015-01-02,245.0,0.0,0.0,0.0
2015-01-01,45.0,0.0,0.0,0.0
在我的代码中,我从包含check and savings帐户的起始值的变量checking_start
和savings_start
开始。我想给代码一个开始日期和结束日期,并让代码遍历每一天,查看当天是否有费用并减去支票和储蓄借记并添加支票和储蓄添加。如果当天没有任何费用,则应将账户保持与前一天相同的价值。另外,我试图在实现中将自己约束到Pandas数据帧。到目前为止,我的代码看起来像这样。
import pandas as pd
from date time import date
check_start = 8500.0
savings_start = 4000.0
start_date = date(2017, 1, 1)
end_date = date(2017, 1, 8)
df = pd.read_csv(file_name.csv, dtype={'Date': str, 'Checking_Debit': float,
'Checking_Addition': float,
'Savings_Debit': float,
'Savings_Addition': float})
使用Pandas模块的Pythonic格式,如何从开始日期到结束日期,一次一天,然后查看这些日期是否有费用或费用,然后从检查和节省。最后,我应该在每个日期有一个支票账户价值的数组,当天的储蓄账户也是如此。
结果应该是使用以下格式将数组写入另一个.csv文件。
Date,Checking,Savings
2017-01-07,1865.1,3925.0
2017-01-06,6373.2,3925.0
2017-01-05,6373.2,3925.0
2017-01-04,6503.2,3925.0
2017-01-03,6503.2,3925.0
2017-01-02,6790.2,3925.0
2017-01-01,8455.0,4000.0
答案 0 :(得分:1)
首先阅读您提供的数据并用数据标识数据中的日期列
import pandas as pd
df = pd.read_csv(r"dat.csv", parse_dates=[0],dtype={'Checking_Debit': float,
'Checking_Addition': float,
'Savings_Debit': float,
'Savings_Addition': float})
将日期设置为更好数据操作的索引。
df = df.set_index("Date")
初始化循环的所有变量
check_start = 8500.0
savings_start = 4000.0
start_date = pd.to_datetime('2015/1/1')
end_date = pd.to_datetime('2015/1/8')
delta = pd.Timedelta('1 days') # time that needs to be added to start date
现在将费用数据w.r.t分组到每个日期
grp_df = df.groupby('Date').sum()
现在我们将针对每天的创建费用报告进行while
循环
expense_report = []
while start_date<=end_date:
if start_date in df.index:
savings_start += (grp_df.loc[start_date,"Savings_Addition"]-grp_df.loc[start_date,"Savings_Debit"])
check_start += (grp_df.loc[start_date,"Checking_Addition"]-grp_df.loc[start_date,"Checking_Debit"])
expense_report.append([start_date,check_start,savings_start])
elif start_date not in df.index:
expense_report.append([start_date,check_start,savings_start])
start_date += delta
将expense_report列表转换为pandas Dataframe
df_exp_rpt = pd.DataFrame(expense_report,columns=["Date","Checking","Savings"])
print(df_exp_rpt)
Date Checking Savings
0 2015-01-01 8455.0 4000.0
1 2015-01-02 6790.2 4075.0
2 2015-01-03 6503.2 4075.0
3 2015-01-04 6503.2 4075.0
4 2015-01-05 6373.2 4075.0
5 2015-01-06 6373.2 4075.0
6 2015-01-07 1865.1 4075.0
7 2015-01-08 1865.1 4075.0
您可以
保存到csvdf_exp_rpt.to_csv("filename.csv")
注意:保存列值为4075而不是3925.0,因为原始数据中的saving_addition列中有75个值