以数据框格式从Pandas费用文件创建每日帐户日志

时间:2017-12-23 19:18:03

标签: python-3.x pandas

我有一个我正在尝试阅读的费用文件,并从此文件创建每日日志。多年来延伸的文件的一小部分如下所示,在2015年1月的几天内。

Date,Checking_Debit,Checking_Addition,Savings_Debit,Savings_Addition
2015-01-07,342.1,0.0,0.0,0.0
2015-01-07,981.0,0.0,0.0,0.0
2015-01-07,3185.0,0.0,0.0,0.0
2015-01-05,55.0,0.0,0.0,0.0
2015-01-05,75.0,0.0,0.0,0.0
2015-01-03,287.0,0.0,0.0,0.0
2015-01-02,64.8,0.0,0.0,0.0
2015-01-02,75.0,0.0,0.0,75.0
2015-01-02,1280.0,0.0,0.0,0.0
2015-01-02,245.0,0.0,0.0,0.0
2015-01-01,45.0,0.0,0.0,0.0

在我的代码中,我从包含check and savings帐户的起始值的变量checking_startsavings_start开始。我想给代码一个开始日期和结束日期,并让代码遍历每一天,查看当天是否有费用并减去支票和储蓄借记并添加支票和储蓄添加。如果当天没有任何费用,则应将账户保持与前一天相同的价值。另外,我试图在实现中将自己约束到Pandas数据帧。到目前为止,我的代码看起来像这样。

import pandas as pd
from date time import date
check_start = 8500.0
savings_start = 4000.0
start_date = date(2017, 1, 1)
end_date = date(2017, 1, 8)
df = pd.read_csv(file_name.csv, dtype={'Date': str, 'Checking_Debit': float, 
                                       'Checking_Addition': float, 
                                       'Savings_Debit': float, 
                                       'Savings_Addition': float})

使用Pandas模块的Pythonic格式,如何从开始日期到结束日期,一次一天,然后查看这些日期是否有费用或费用,然后从检查和节省。最后,我应该在每个日期有一个支票账户价值的数组,当天的储蓄账户也是如此。

结果应该是使用以下格式将数组写入另一个.csv文件。

Date,Checking,Savings
2017-01-07,1865.1,3925.0
2017-01-06,6373.2,3925.0
2017-01-05,6373.2,3925.0
2017-01-04,6503.2,3925.0
2017-01-03,6503.2,3925.0
2017-01-02,6790.2,3925.0
2017-01-01,8455.0,4000.0

1 个答案:

答案 0 :(得分:1)

首先阅读您提供的数据并用数据标识数据中的日期列

import pandas as pd

df = pd.read_csv(r"dat.csv", parse_dates=[0],dtype={'Checking_Debit': float, 
                                                               'Checking_Addition': float, 
                                                               'Savings_Debit': float, 
                                                               'Savings_Addition': float})

将日期设置为更好数据操作的索引。

df = df.set_index("Date")

初始化循环的所有变量

check_start = 8500.0
savings_start = 4000.0
start_date = pd.to_datetime('2015/1/1')
end_date = pd.to_datetime('2015/1/8')
delta = pd.Timedelta('1 days') # time that needs to be added to start date

现在将费用数据w.r.t分组到每个日期

grp_df = df.groupby('Date').sum()

现在我们将针对每天的创建费用报告进行while循环

expense_report = []
while start_date<=end_date:
    if start_date in df.index:
        savings_start += (grp_df.loc[start_date,"Savings_Addition"]-grp_df.loc[start_date,"Savings_Debit"])
        check_start += (grp_df.loc[start_date,"Checking_Addition"]-grp_df.loc[start_date,"Checking_Debit"])
        expense_report.append([start_date,check_start,savings_start])
    elif start_date not in df.index:
        expense_report.append([start_date,check_start,savings_start])

    start_date += delta

将expense_report列表转换为pandas Dataframe

df_exp_rpt = pd.DataFrame(expense_report,columns=["Date","Checking","Savings"])



 print(df_exp_rpt)
        Date    Checking    Savings
0   2015-01-01  8455.0  4000.0
1   2015-01-02  6790.2  4075.0
2   2015-01-03  6503.2  4075.0
3   2015-01-04  6503.2  4075.0
4   2015-01-05  6373.2  4075.0
5   2015-01-06  6373.2  4075.0
6   2015-01-07  1865.1  4075.0
7   2015-01-08  1865.1  4075.0

您可以

保存到csv
df_exp_rpt.to_csv("filename.csv")

注意:保存列值为4075而不是3925.0,因为原始数据中的saving_addition列中有75个值