pandas - pivot_table,同时保留订单失败

时间:2017-10-20 11:45:10

标签: python-2.7 pandas pivot-table pandas-groupby

我有以下数据框,其中周不是ISO周而是财政周(1是7月的第1周,52是6月的最后一周):

> df
     domain  week  count
0        A    43      5
1        A    45      1
2        A    50      1
3        A    51      4
4        A     1      3
5        A     3     12
6        B    43      1
7        B    44      1
8        B    45      4
9        B    50     11
10       B     2      3
11       B     3     12
12       C    51      6
13       C     1     14
14       C     5      1

我希望在保留一周的顺序的同时转动此表,以获得一个如下所示的新数据框,其值为count,列为域:

> new_df
week   A      B     C
43      5     1   NaN
44    NaN     1   NaN
45      1     4   NaN      
50      1    11   NaN
51      4   NaN     6
1       3   NaN    14
2     NaN     3   NaN
3      12    12   NaN
5     NaN   NaN     1

我尝试使用groupie并取消暂停,但收到了此错误:

> df = df.groupby(['week'], sort=False)['count'].unstack('domain')
AttributeError: Cannot access callable attribute 'unstack' of 'SeriesGroupBy' objects, try using the 'apply' method

2 个答案:

答案 0 :(得分:1)

选项1] 您可以使用自定义排序的weeks索引助手和.loc

In [4810]: weeks = pd.Index(list(range(26, 52)) + list(range(26)))

In [4819]: dfp = df.groupby(['week','domain'])['count'].sum().unstack()

In [4820]: dfp.loc[weeks & dfp.index]
Out[4820]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0

选项2] 或者,使用pivot

In [4821]: dfp = df.pivot('week', 'domain', 'count')

In [4822]: dfp.loc[weeks & dfp.index]
Out[4822]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0

选项3] 或,reindex代替.loc

In [4830]: dfp.reindex(weeks & dfp.index)
Out[4830]:
domain     A     B     C
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0

详细

In [4826]: weeks
Out[4826]:
Int64Index([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
            43, 44, 45, 46, 47, 48, 49, 50, 51,  0,  1,  2,  3,  4,  5,  6,  7,
             8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
            25],
           dtype='int64')

In [4827]: weeks & dfp.index
Out[4827]: Int64Index([43, 44, 45, 50, 51, 1, 2, 3, 5], dtype='int64')

答案 1 :(得分:0)

您需要week的自定义订单,因此需要ordered categorical自定义订单并省略sort=False

cats = list(range(26, 52)) + list(range(26))
print (cats)
[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
 47, 48, 49, 50, 51, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

df['week'] = df['week'].astype('category', ordered=True, categories=cats)

df = df.groupby(['week','domain'])['count'].sum().unstack()
print (df)
domain     A     B     C
week                    
43       5.0   1.0   NaN
44       NaN   1.0   NaN
45       1.0   4.0   NaN
50       1.0  11.0   NaN
51       4.0   NaN   6.0
1        3.0   NaN  14.0
2        NaN   3.0   NaN
3       12.0  12.0   NaN
5        NaN   NaN   1.0