如果已经找到索引,我想从一个数据框创建一个带有新列的数据框,但是我不知道我将创建多少列:
#if WPF || Android || IOS || WF
if (/* stuff */)
{
// do stuff
}
else if (/* other stuff */)
#endif
// define stuff
#if !WinRT && !XForms
else
// Do some magic
#endif
我想要:
pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]])
不知道一开始应该创建多少列。
答案 0 :(得分:6)
df = pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]], columns = ['person','hobby'])
您可以对person
进行分组,并在hobby
中搜索unique
。然后使用.apply(pd.Series)
将列表展开为列:
df.groupby('person').hobby.unique().apply(pd.Series).reset_index()
person 0 1
0 Andrew running cars
1 John guitar dancing
2 Michael football NaN
如果数据帧较大,请尝试使用更有效的替代方法:
df = df.groupby('person').hobby.unique()
df = pd.DataFrame(df.values.tolist(), index=df.index).reset_index()
本质上是一样的,但是在应用pd.Series
时避免了循环遍历行。
答案 1 :(得分:1)
使用GroupBy.cumcount
获取counter
,然后通过unstack
进行整形:
df1 = pd.DataFrame([["John","guitar"],
["Michael","football"],
["Andrew","running"],
["John","dancing"],
["Andrew","cars"]], columns=['a','b'])
a b
0 John guitar
1 Michael football
2 Andrew running
3 John dancing
4 Andrew cars
df = (df1.set_index(['a', df1.groupby('a').cumcount()])['b']
.unstack()
.rename_axis(-1)
.reset_index()
.rename(columns=lambda x: x+1))
print (df)
0 1 2
0 Andrew running cars
1 John guitar dancing
2 Michael football NaN
或聚合list
并通过构造函数创建新字典:
s = df1.groupby('a')['b'].agg(list)
df = pd.DataFrame(s.values.tolist(), index=s.index).reset_index()
print (df)
a 0 1
0 Andrew running cars
1 John guitar dancing
2 Michael football None
答案 2 :(得分:0)
假设列名称为['person', 'activity']
,您可以这样做
df_out = df.groupby('person').agg(list).reset_index()
df_out = pd.concat([df_out, pd.DataFrame(df_out['activity'].values.tolist())], axis=1)
df_out = df_out.drop('activity', 1)
给你
person 0 1
0 Andrew running cars
1 John guitar dancing
2 Michael football None