Groupby创建新列

时间:2018-12-13 14:13:48

标签: python pandas group-by pandas-groupby

如果已经找到索引,我想从一个数据框创建一个带有新列的数据框,但是我不知道我将创建多少列:

#if WPF || Android || IOS || WF
if (/* stuff */)
{
    // do stuff
}
else if (/* other stuff */)
#endif
    // define stuff
#if !WinRT && !XForms
else
    // Do some magic
#endif

我想要:

pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]])

不知道一开始应该创建多少列。

3 个答案:

答案 0 :(得分:6)

df = pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]], columns = ['person','hobby'])

您可以对person进行分组,并在hobby中搜索unique。然后使用.apply(pd.Series)将列表展开为列:

df.groupby('person').hobby.unique().apply(pd.Series).reset_index()
    person         0        1
0   Andrew   running     cars
1     John    guitar  dancing
2  Michael  football      NaN

如果数据帧较大,请尝试使用更有效的替代方法:

df = df.groupby('person').hobby.unique()
df = pd.DataFrame(df.values.tolist(), index=df.index).reset_index()

本质上是一样的,但是在应用pd.Series时避免了循环遍历行。

答案 1 :(得分:1)

使用GroupBy.cumcount获取counter,然后通过unstack进行整形:

df1 = pd.DataFrame([["John","guitar"],
                    ["Michael","football"],
                    ["Andrew","running"],
                    ["John","dancing"],
                    ["Andrew","cars"]], columns=['a','b'])

         a         b
0     John    guitar
1  Michael  football
2   Andrew   running
3     John   dancing
4   Andrew      cars


df = (df1.set_index(['a', df1.groupby('a').cumcount()])['b']
         .unstack()
         .rename_axis(-1)
         .reset_index()
         .rename(columns=lambda x: x+1))
print (df)

         0         1        2
0   Andrew   running     cars
1     John    guitar  dancing
2  Michael  football      NaN

或聚合list并通过构造函数创建新字典:

s = df1.groupby('a')['b'].agg(list)
df = pd.DataFrame(s.values.tolist(), index=s.index).reset_index()
print (df)
         a         0        1
0   Andrew   running     cars
1     John    guitar  dancing
2  Michael  football     None

答案 2 :(得分:0)

假设列名称为['person', 'activity'],您可以这样做

df_out = df.groupby('person').agg(list).reset_index()
df_out = pd.concat([df_out, pd.DataFrame(df_out['activity'].values.tolist())], axis=1)
df_out = df_out.drop('activity', 1)

给你

    person         0        1
0   Andrew   running     cars
1     John    guitar  dancing
2  Michael  football     None