我有以下数据框,我想转换为一种新格式,该格式根据'approver_type'中的分类值将所有者和批准者分开。这将导致基于“ gid”值的行合并,并在单独的列中包含该组各自的所有者和批准者。
开始数据框:
>>> sourcedf
gid group_name approver_type approver_name
0 5 foo owner joe
1 6 bar approver john
2 7 baz owner jill
3 7 baz approver bill
4 5 foo approver bob
5 7 baz approver jimmy
所需数据框:
>>> df
gid group_name owners approvers
0 5 foo joe bob
1 6 bar NaN john
2 7 baz jill bill,jimmy
用于复制源文件的目录:
{'gid': {0: 5, 1: 6, 2: 7, 3: 7, 4: 5, 5: 7}, 'group_name': {0: 'foo', 1: 'bar', 2: 'baz', 3: 'baz', 4: 'foo', 5: 'baz'}, 'approver_type': {0: 'owner', 1: 'approver', 2: 'owner', 3: 'approver', 4: 'approver', 5: 'approver'}, 'approver_name': {0: 'joe', 1: 'john', 2: 'jill', 3: 'bill', 4: 'bob', 5: 'jimmy'}}
答案 0 :(得分:4)
将pivot_table
与自定义aggfunc
一起使用:join
df.pivot_table(index=['gid','group_name'],columns='approver_type',values='approver_name',aggfunc=','.join)
Out[36]:
approver_type approver owner
gid group_name
5 foo bob joe
6 bar john None
7 baz bill,jimmy jill