Pandas中的Groupby产生Groupby Object而不是Dataframe

时间:2016-05-10 18:40:08

标签: python python-2.7 pandas

我有Pandas dataframe,显示人们在1月和2月份花了多少钱。我想使用groupby函数按人分组,但我的代码产生的是DataFrameGroupBy object而不是实际的数据帧。我还有一个性别专栏,我只想留下来。

代码:

import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 21, 'Male'], ['Person B', 15, 3, 'Female']], columns=['Names', 'Jan', 'Feb', 'Gender'])
print df.groupby(['Names', 'Jan', 'Feb'])

输出:

<pandas.core.groupby.DataFrameGroupBy object at 0x020D4470>

启动数据帧:

      Names  Jan  Feb  Gender
0  Person A    5   21    Male
1  Person B   15    3    Female

期望的输出:

            Names  Value    Gender
0  Person A - Jan      5     Male
1  Person A - Feb     21     Male
2  Person B - Jan     15     Female
3  Person B - Feb      3     Female

2 个答案:

答案 0 :(得分:3)

您可以将meltsort_values一起使用,然后将列标记为drop public void HideTheDarnBars() { View decorView = Window.DecorView; var uiOptions = (int)decorView.SystemUiVisibility; uiOptions |= (int)SystemUiFlags.Fullscreen; uiOptions |= (int)SystemUiFlags.HideNavigation; uiOptions |= (int)SystemUiFlags.ImmersiveSticky; uiOptions |= (int)SystemUiFlags.LayoutFullscreen; uiOptions |= (int)SystemUiFlags.LayoutHideNavigation; decorView.SystemUiVisibility = (StatusBarVisibility)uiOptions; }

variable

使用assign的另一个单行解决方案:

df1 = pd.melt(df, id_vars='Names').sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
print df1
           Names  value
0  Person A- Jan      5
2  Person A- Feb     21
1  Person B- Jan     15
3  Person B- Feb      3

编辑:

您可以向参数print pd.melt(df, id_vars='Names').sort_values('Names') .assign(Names = lambda x: x['Names'] + '- ' + x['variable']) .drop('variable', axis=1) Names value 0 Person A- Jan 5 2 Person A- Feb 21 1 Person B- Jan 15 3 Person B- Feb 3 添加新列:

id_vars

一行解决方案,如果您需要重新排序列,请使用reindex_axis

df1 = pd.melt(df, id_vars=['Names', 'Gender']).sort_values('Names')
df1['Names'] = df1['Names'] + '- ' + df1['variable']
df1 = df1.drop('variable', axis=1)
df1 = df1[['Names','value','Gender']]
print df1
           Names  value  Gender
0  Person A- Jan      5    Male
2  Person A- Feb     21    Male
1  Person B- Jan     15  Female
3  Person B- Feb      3  Female

答案 1 :(得分:2)

另一种使用堆栈的解决方案。

df_out = df.set_index(['Names']).stack().to_frame().reset_index()
df_out.columns = ['Names','month','value']

修改

这也应该有效:

stack_df = df.set_index(['Names', 'Gender']).stack().to_frame().reset_index()
stack_df.columns = ['Names','Gender','Month', 'Value']