透视DataFrameGroupBy panadas对象

时间:2018-02-24 06:02:28

标签: python pandas pandas-groupby

我有一个名为'groups'的DataFrameGroupBy对象,如下所示:

for key, item in grouped:
    print('key: {0}, value: {1}'.format(key, grouped.get_group(key)))

key: 9909, value:                    date  quantity
0  2018-01-28 00:00:00+00:00       2.3
1  2018-01-29 00:00:00+00:00       3.0
key: 1151, value:                  date_period  quantity
2 2018-01-28 00:00:00+00:00       5.0
3 2018-01-29 00:00:00+00:00       9.0

我正在尝试将其转换为如下所示的数据框:

id,   day1,   day2
9909  2.3     3.0
1151  5.5     9.0

当我运行以下内容时:

item_df = grouped.to_frame()

我收到以下错误:

'DataFrameGroupBy' object has no attribute 'to_frame'

当我运行以下

df = grouped.reset_index(inplace=True) 

我得到以下内容:

Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method.

我尝试重新格式化的DataFrameGroupBy对象是从这样的数据框创建的:

data = {'created_date': ['2018-01-22 12:40:03', '2018-01-22 13:40:03', '2018-01-23 15:00:05', '2018-01-26 14:30:04'], 
     'quantity': [11, 21, 23, 12], 'id': ['543', '543', '842', '543']}
    df = pd.DataFrame(data, columns = ['created_date', 'quantity' , 'id']) 
df.index = df['created_date']
df.index = pd.to_datetime(df.index)
g = df.groupby('id').resample('D')['quantity'].sum()
df = g.to_frame()
dates = pd.date_range(df.index.levels[1].min(), g.index.levels[1].max()) 
idx = pd.MultiIndex.from_product([df.index.levels[0], dates])
df= df.reindex(idx, fill_value=0)
df = df.fillna(0)
df.reset_index(inplace=True) 
df.rename(columns={'level_0': 'id', 'level_1': 'date'}, inplace=True)
grouped = df.groupby('id')
#now to pivot this groupby object...

1 个答案:

答案 0 :(得分:1)

可以通过申请:

def f(x):
    return (x.pivot('id','date','quantity'))

grouped = df.groupby('id', group_keys=False).apply(f)
print (grouped)
date  2018-01-22  2018-01-23  2018-01-24  2018-01-25  2018-01-26
id                                                              
543         32.0         0.0         0.0         0.0        12.0
842          0.0        23.0         0.0         0.0         0.0

但输出相同:

print (df.pivot('id','date','quantity'))
date  2018-01-22  2018-01-23  2018-01-24  2018-01-25  2018-01-26
id                                                              
543         32.0         0.0         0.0         0.0        12.0
842          0.0        23.0         0.0         0.0         0.0