我目前有一个像这样的数据框(df):
name info
alpha foo,bar
alpha bar,foo
beta foo,bar
beta bar,foo
beta baz,qux
我希望创建一个这样的数据框:
name info
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
我与groupby.apply(list)关系密切。例如
new_df=df.groupby('name')['info'].apply(list)
但是,我似乎无法弄清楚如何以原始数据帧格式获取输出。 (即有两列(如例)
我想我需要reset_index
和unstack
?感谢任何帮助!
答案 0 :(得分:1)
IIUC
df.assign(info='('+df['info']+')').groupby('name')['info'].apply(','.join).to_frame('info')
Out[267]:
info
name
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
#df.assign(info='('+df['info']+')')# adding the ( and ) for your single string to match with the out put
#groupby('name')# group by the name, you need merge info under the same name
#apply(','.join).to_frame('info') # this will combine each info into one string under the same group
答案 1 :(得分:1)
请尝试使用for
循环:
uniqnames = df.name.unique() # get unique names
newdata = [] # data list for output dataframe
for u in uniqnames: # for each unique name
subdf = df[df.name == u] # get rows with this unique name
s = ""
for i in subdf['info']:
s += "("+i+")," # join all info cells for that name
newdata.append([u, s[:-1]]) # remove trailing comma from infos & add row to data list
newdf = pd.DataFrame(data=newdata, columns=['name','info'])
print(newdf)
输出完全符合要求:
name info
0 alpha (foo,bar),(bar,foo)
1 beta (foo,bar),(bar,foo),(baz,qux)
答案 2 :(得分:0)
IIUC:
df = pd.DataFrame({'name':['alpha']*2+['beta']*3,
'info':[['foo','bar'],['bar','foo'],
['foo','bar'],['bar','foo'],
['baz','qux']]})
print(df)
Inuput:
info name
0 [foo, bar] alpha
1 [bar, foo] alpha
2 [foo, bar] beta
3 [bar, foo] beta
4 [baz, qux] beta
现在,groupby并应用reset_index()返回dataframe:
new_df = df.groupby('name')['info'].apply(list)
new_df = new_df.reset_index()
print(new_df)
输出:
name info
0 alpha [[foo, bar], [bar, foo]]
1 beta [[foo, bar], [bar, foo], [baz, qux]]