这是我过去几天一直试图完成的一个项目。我们正在寻找更好的方法将财务数据集成到我们的仪表板中,但我们使用的软件以令人作呕的方式导出我们的数据,这种方式无法插入任何类型的程序,因为它意味着一个人可以直观地看一眼并获得一个想法。
我希望得到关于如何正确编码的建议,但是如果我在解决它的方式上疯了。这些数据已经过大量清理,所以如果有严重错误请告诉我:
Expense Categories Jan Actual Jan Budget Feb Actual \
3 5600 Direct Personnel Expenses 2521.73 0 -290.57
4 6000 Automobile Expense 909.33 1314 483.15
5 6160 Funeral Home Expense 1072 1800.02 0
6 6400 Lab Expense 0 0 65.18
9 6100 Marketing & Promotion 543.13 1850.01 1158.41
同样,在清理时我拉出了变量,例如:
department = "PR"
direct_indirect = {'5600 Direct Personnel Expenses' : 'Direct Expense', etc}
我的最终目标是在我通过画面为每个部门设计的仪表板中包含预算摘要,因此我相信最佳结果将如下所示:
Expense Category Direct/Indirect Department Month-Year Actual Budget
6400 Lab Expense Direct Expense PR jan 2016 0 0
6400 Lab Expense Direct Expense PR feb 2016 0 0
6400 Lab Expense Direct Expense PR mar 2016 0 0
6400 Lab Expense Direct Expense PR apr 2016 0 0
6400 Lab Expense Direct Expense PR may 2016 0 0
我正在努力解决如何完成这一问题,我完全不确定如何通过在每个费用类型的新数据框架中创建多行来实现,每两个列都是一个新的月份。我觉得唯一的方法是使用:
for index, row in df1.iterrows():
但是我会迷失在如何迭代每一列,然后将它们分配给一个新的数据帧。
如果我遗漏了您需要的任何详细信息,请告诉我们。感谢您的帮助。
安迪
答案 0 :(得分:2)
melt
和pivot_table
df=df.melt('Expense Categories')
df[['Month','Type']]=df.variable.str.split(' ',expand=True)
df=pd.pivot_table(df,index=['Expense Categories','Month'],columns='Type',values='value').reset_index()
df
Out[1176]:
Type Expense Categories Month Actual Budget
0 5600 Direct Personnel Expenses Feb -290.57 NaN
1 5600 Direct Personnel Expenses Jan 2521.73 0.00
2 6000 Automobile Expense Feb 483.15 NaN
3 6000 Automobile Expense Jan 909.33 1314.00
4 6100 Marketing & Promotion Feb 1158.41 NaN
5 6100 Marketing & Promotion Jan 543.13 1850.01
6 6160 Funeral Home Expense Feb 0.00 NaN
7 6160 Funeral Home Expense Jan 1072.00 1800.02
8 6400 Lab Expense Feb 65.18 NaN
9 6400 Lab Expense Jan 0.00 0.00
我们几乎到达那里
df['department']='PR'
df['Direct/Indirect'] = 'Direct Expense'
df['Month-Year'] = df['Month'] + str(2016)
df
Out[1182]:
Type Expense Categories Month Actual Budget department \
0 5600 Direct Personnel Expenses Feb -290.57 NaN PR
1 5600 Direct Personnel Expenses Jan 2521.73 0.00 PR
2 6000 Automobile Expense Feb 483.15 NaN PR
3 6000 Automobile Expense Jan 909.33 1314.00 PR
4 6100 Marketing & Promotion Feb 1158.41 NaN PR
5 6100 Marketing & Promotion Jan 543.13 1850.01 PR
6 6160 Funeral Home Expense Feb 0.00 NaN PR
7 6160 Funeral Home Expense Jan 1072.00 1800.02 PR
8 6400 Lab Expense Feb 65.18 NaN PR
9 6400 Lab Expense Jan 0.00 0.00 PR
Type Direct/Indirect Month-Year
0 Direct Expense Feb2016
1 Direct Expense Jan2016
2 Direct Expense Feb2016
3 Direct Expense Jan2016
4 Direct Expense Feb2016
5 Direct Expense Jan2016
6 Direct Expense Feb2016
7 Direct Expense Jan2016
8 Direct Expense Feb2016
9 Direct Expense Jan2016
答案 1 :(得分:1)
您可以使用df.columns.str.split
和stack
重塑数据框:
import sys
import pandas as pd
df = pd.DataFrame({'Expense Categories': ['5600 Direct Personnel Expenses', '6000 Automobile Expense', '6160 Funeral Home Expense', '6400 Lab Expense', '6100 Marketing & Promotion'], 'Feb Actual': [-290.57, 483.15, 0.0, 65.18, 1158.41], 'Jan Actual': [2521.73, 909.33, 1072.0, 0.0, 543.13], 'Jan Budget': [0.0, 1314.0, 1800.02, 0.0, 1850.01]})
df = df.set_index('Expense Categories')
df.columns = df.columns.str.split(expand=True)
df.columns.names = ['Month-Year',None]
df = df.stack('Month-Year')
df = df.reset_index()
df['Direct/Indirect'] = 'Direct Expense'
df['Department'] = 'PR'
df['Month-Year'] = df['Month-Year'] + ' 2016'
with pd.option_context('display.width', sys.maxsize):
print(df)
产量
Expense Categories Month-Year Actual Budget Direct/Indirect Department
0 5600 Direct Personnel Expenses Feb 2016 -290.57 NaN Direct Expense PR
1 5600 Direct Personnel Expenses Jan 2016 2521.73 0.00 Direct Expense PR
2 6000 Automobile Expense Feb 2016 483.15 NaN Direct Expense PR
3 6000 Automobile Expense Jan 2016 909.33 1314.00 Direct Expense PR
4 6160 Funeral Home Expense Feb 2016 0.00 NaN Direct Expense PR
5 6160 Funeral Home Expense Jan 2016 1072.00 1800.02 Direct Expense PR
6 6400 Lab Expense Feb 2016 65.18 NaN Direct Expense PR
7 6400 Lab Expense Jan 2016 0.00 0.00 Direct Expense PR
8 6100 Marketing & Promotion Feb 2016 1158.41 NaN Direct Expense PR
9 6100 Marketing & Promotion Jan 2016 543.13 1850.01 Direct Expense PR
<强>解释强>:
df = df.set_index('Expense Categories')
df.columns = df.columns.str.split(expand=True)
df.columns.names = ['Month-Year',None]
这些行为列索引创建MultiIndex。它将Month与列标签的Acrtual / Budget部分分开。此处使用set_index
隐藏Expense Categories
操作中的str.split
列。此时df
看起来像这样:
Month-Year Feb Jan
Actual Actual Budget
Expense Categories
5600 Direct Personnel Expenses -290.57 2521.73 0.00
6000 Automobile Expense 483.15 909.33 1314.00
6160 Funeral Home Expense 0.00 1072.00 1800.02
6400 Lab Expense 65.18 0.00 0.00
6100 Marketing & Promotion 1158.41 543.13 1850.01
现在我们可以使用Jan/Feb
将stack
(或更确切地说,“月 - 年”级别的索引)移动到自己的列中:
df = df.stack('Month-Year')
产量
Actual Budget
Expense Categories Month-Year
5600 Direct Personnel Expenses Feb -290.57 NaN
Jan 2521.73 0.00
6000 Automobile Expense Feb 483.15 NaN
Jan 909.33 1314.00
6160 Funeral Home Expense Feb 0.00 NaN
Jan 1072.00 1800.02
6400 Lab Expense Feb 65.18 NaN
Jan 0.00 0.00
6100 Marketing & Promotion Feb 1158.41 NaN
Jan 543.13 1850.01