Pandas pivot_table与pd.grouper和Margins

时间:2018-06-17 01:47:54

标签: python pandas pivot-table

当列设置为Margins=True时,

pd.grouper datetime将无法在Pandas pivot_table中使用。这是我的代码按预期工作 -

p = df.pivot_table(values='Qty', index=['ItemCode', 'LineItem'],columns=pd.Grouper(key = 'Date', freq='W'), aggfunc=np.sum, fill_value=0)

但如果我添加margins=True,那么我会得到一个小计,我会收到错误说:

  

KeyError:“[TimeGrouper(key ='in time',freq =,axis = 0,sort = True,closed ='left',label ='left',how ='mean',convention ='e' ,base = 0)]不在索引“

1 个答案:

答案 0 :(得分:1)

这看起来很奇怪!我不知道是什么导致数据透视表使用TimeGrouper本身作为索引。这似乎是一个错误,但我不确定。无论如何,我认为数据透视表无法执行子索引页边距,因此这是使用groupby的解决方案:

样本数据

import pandas as pd
from random import randint, choice
from string import ascii_letters, ascii_lowercase

# Say we have a dataframe with 500 rows and 20 different items
df_len = range(500)
item_codes = [''.join([choice(ascii_letters) for _ in range(10)]) for __ in range(20)]
df = pd.DataFrame({
    'ItemCode': [choice(item_codes) for __ in df_len],
    'Date': [pd.datetime.today() - pd.Timedelta(randint(0, 28), 'D') for _ in df_len],
    'Qty': [randint(1,10) for _ in df_len],
    'LineItem': [choice(('a', 'b', 'c')) for _ in df_len],
})

df.head()

     ItemCode                       Date  Qty LineItem
0  IFaEmWGHTJ 2020-05-21 13:29:56.687412    8        a
1  jvLqoLfBcd 2020-05-23 13:29:56.687509    6        a
2  GOPFJEoSUm 2020-05-13 13:29:56.687550    1        a
3  qJqzzgDTaa 2020-05-03 13:29:56.687575    5        a
4  BCvRrgcpFD 2020-05-24 13:29:56.690114    8        b

解决方案

res = (df.groupby(['ItemCode', 'LineItem', pd.Grouper(key='Date', freq='W')])['Qty']
       .count()
       .unstack()
       .fillna(0))
res.loc[('column_total', ''), :] = res.sum(axis=0)
res.loc[:,'row_total'] = res.sum(axis=1)

结果

|                      |   2020-05-03 |   2020-05-10 |   2020-05-17 |   2020-05-24 |   2020-05-31 |   row_total |
|:---------------------|-------------:|-------------:|-------------:|-------------:|-------------:|------------:|
| ('CtdClujjRF', 'a')  |            1 |            2 |            2 |            0 |            0 |           5 |
| ('CtdClujjRF', 'b')  |            0 |            3 |            1 |            1 |            1 |           6 |
| ('CtdClujjRF', 'c')  |            1 |            1 |            2 |            2 |            1 |           7 |
| ('DnQcEbHoVL', 'a')  |            0 |            2 |            1 |            1 |            1 |           5 |
| ('DnQcEbHoVL', 'b')  |            1 |            1 |            1 |            2 |            2 |           7 |
                     ...            ...            ...            ...            ...            ...           ...
| ('sxFnkCcSJu', 'c')  |            0 |            2 |            2 |            3 |            0 |           7 |
| ('vOaWNHgOgm', 'a')  |            0 |            5 |            1 |            7 |            1 |          14 |
| ('vOaWNHgOgm', 'b')  |            1 |            0 |            1 |            3 |            4 |           9 |
| ('vOaWNHgOgm', 'c')  |            1 |            2 |            2 |            5 |            1 |          11 |
| ('column_total', '') |           64 |          128 |          115 |          127 |           66 |         500 |