我有以下代码:
data_df = pandas.read_csv(filename, parse_dates = True)
groupings = np.unique(data_df[['Ind']])
for group in groupings:
data_df2 = data_df[data_df['Ind'] == group]
table = pandas.pivot_table(data_df2, values='Rev', index=['Ind', 'Month'], columns=['Type'], aggfunc=sum)
table = table.sort_index(ascending=[0, 0])
print(table)
如何对数据透视表进行排序'表格'按月和年(例如,当我打印'表'我希望Dec-14成为每个组的第一行输出)?
以下是“data_df':
中的数据示例 Ind Type Month Rev
0 A Voice Dec-14 10.00
1 A Voice Jan-15 8.00
2 A Voice Feb-15 13.00
3 A Voice Mar-15 9.00
4 A Voice Apr-15 11.00
5 A Voice May-15 14.00
6 A Voice Jun-15 6.00
7 A Voice Jul-15 4.00
8 A Voice Aug-15 12.00
9 A Voice Sep-15 7.00
10 A Voice Oct-15 5.00
11 A Elec Dec-14 8.04
12 A Elec Jan-15 6.95
13 A Elec Feb-15 7.58
14 A Elec Mar-15 8.81
15 A Elec Apr-15 8.33
16 A Elec May-15 9.96
17 A Elec Jun-15 7.24
18 A Elec Jul-15 4.26
19 A Elec Aug-15 10.84
20 A Elec Sep-15 4.82
21 A Elec Oct-15 5.68
22 B Voice Dec-14 10.00
23 B Voice Jan-15 8.00
24 B Voice Feb-15 13.00
25 B Voice Mar-15 9.00
26 B Voice Apr-15 11.00
27 B Voice May-15 14.00
28 B Voice Jun-15 6.00
29 B Voice Jul-15 4.00
.. .. ... ... ...
输出是(我正在玩升序,但它只想对alpha进行排序):
Type Elec Voice
Ind Month
A Sep-15 4.82 7
Oct-15 5.68 5
May-15 9.96 14
Mar-15 8.81 9
Jun-15 7.24 6
Jul-15 4.26 4
Jan-15 6.95 8
Feb-15 7.58 13
Dec-14 8.04 10
Aug-15 10.84 12
Apr-15 8.33 11
我希望输出按日期排序:
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
...
答案 0 :(得分:1)
您需要转换您的“月份”'从CSV文件创建DataFrame后的列到日期时间:
df['Month'] = pd.to_datetime(df['Month'], format="%b-%y")
因为目前它是一个字符串......
或者您可以使用以下技巧(date_parser
)来解析日期,在" read_csv":
from __future__ import print_function
import pandas as pd
dateparser = lambda x: pd.datetime.strptime(x, '%b-%y')
df = pd.read_csv('data.csv', delimiter=r'\s+', parse_dates=['Month'], date_parser=dateparser)
print(df.sort_values(['Month']))
PS我不知道您首选的输出日期格式...
答案 1 :(得分:1)
我认为您可以先转换Month
to_datetime
列,然后to_period
:
data_df['Month'] = pd.to_datetime(data_df['Month'], format='%b-%y').dt.to_period('M')
Ind Type Month Rev
0 A Voice 2014-12 10.00
1 A Voice 2015-01 8.00
2 A Voice 2015-02 13.00
3 A Voice 2015-03 9.00
4 A Voice 2015-04 11.00
5 A Voice 2015-05 14.00
6 A Voice 2015-06 6.00
7 A Voice 2015-07 4.00
8 A Voice 2015-08 12.00
9 A Voice 2015-09 7.00
10 A Voice 2015-10 5.00
11 A Elec 2014-12 8.04
12 A Elec 2015-01 6.95
13 A Elec 2015-02 7.58
14 A Elec 2015-03 8.81
15 A Elec 2015-04 8.33
16 A Elec 2015-05 9.96
17 A Elec 2015-06 7.24
18 A Elec 2015-07 4.26
19 A Elec 2015-08 10.84
20 A Elec 2015-09 4.82
21 A Elec 2015-10 5.68
22 B Voice 2014-12 10.00
23 B Voice 2015-01 8.00
24 B Voice 2015-02 13.00
25 B Voice 2015-03 9.00
26 B Voice 2015-04 11.00
27 B Voice 2015-05 14.00
28 B Voice 2015-06 6.00
29 B Voice 2015-07 4.00
然后使用pivot_table
,不需要排序:
data_df = pd.pivot_table(data_df, values='Rev', index=['Ind', 'Month'], columns='Type', aggfunc=sum)
print data_df
Type Elec Voice
Ind Month
A 2014-12 8.04 10
2015-01 6.95 8
2015-02 7.58 13
2015-03 8.81 9
2015-04 8.33 11
2015-05 9.96 14
2015-06 7.24 6
2015-07 4.26 4
2015-08 10.84 12
2015-09 4.82 7
2015-10 5.68 5
B 2014-12 NaN 10
2015-01 NaN 8
2015-02 NaN 13
2015-03 NaN 9
2015-04 NaN 11
2015-05 NaN 14
2015-06 NaN 6
2015-07 NaN 4
最后,您可以通过strftime
Datetimeindex
中的Multiindex
new_index = zip(data_df.index.get_level_values('Ind'),data_df.index.get_level_values('Month').strftime('%b-%y'))
data_df.index = pd.MultiIndex.from_tuples(new_index, names = data_df.index.names)
print data_df
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
Mar-15 8.81 9
Apr-15 8.33 11
May-15 9.96 14
Jun-15 7.24 6
Jul-15 4.26 4
Aug-15 10.84 12
Sep-15 4.82 7
Oct-15 5.68 5
B Dec-14 NaN 10
Jan-15 NaN 8
Feb-15 NaN 13
Mar-15 NaN 9
Apr-15 NaN 11
May-15 NaN 14
Jun-15 NaN 6
Jul-15 NaN 4
或者您可以使用reset_index
,dt.strftime
和set_index
:
data_df = data_df.reset_index(level=1)
data_df['Month'] = data_df['Month'].dt.strftime('%b-%y')
data_df = data_df.set_index('Month', append=True)
print data_df
Type Elec Voice
Ind Month
A Dec-14 8.04 10
Jan-15 6.95 8
Feb-15 7.58 13
Mar-15 8.81 9
Apr-15 8.33 11
May-15 9.96 14
Jun-15 7.24 6
Jul-15 4.26 4
Aug-15 10.84 12
Sep-15 4.82 7
Oct-15 5.68 5
B Dec-14 NaN 10
Jan-15 NaN 8
Feb-15 NaN 13
Mar-15 NaN 9
Apr-15 NaN 11
May-15 NaN 14
Jun-15 NaN 6
Jul-15 NaN 4
答案 2 :(得分:0)
首先使用@jezrael的解决方案重新格式化Month列,然后执行此操作以获取数据透视表:
>>> df_data.pivot_table(values='Rev', index=['Ind', 'Month'], columns='Type')
Type Elec Voice
Ind Month
A 2014-12 8.04 10
2015-01 6.95 8
2015-02 7.58 13
2015-03 8.81 9
2015-04 8.33 11
2015-05 9.96 14
2015-06 7.24 6
2015-07 4.26 4
2015-08 10.84 12
2015-09 4.82 7
2015-10 5.68 5
B 2014-12 NaN 10
2015-01 NaN 8
2015-02 NaN 13
2015-03 NaN 9
2015-04 NaN 11
2015-05 NaN 14
2015-06 NaN 6
2015-07 NaN 4
或者groupby
使用unstack
:
df.groupby(['Ind', 'Month', 'Type']).Rev.sum().unstack('Type')