Question

我有一个如下所示的DataFrame：

df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})

    ID  Name
0   1   John Doe
1   1   Jane Doe
2   2   John Smith
3   2   Jane Smith
4   3   Jack Hill
5   4   Jill Hill

然后，我按ID添加了另一个列分组并获取了名称中的唯一值：

df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')

    ID  Name    Multi Name
0   1   John Doe    [John Doe, Jane Doe]
1   1   Jane Doe    [John Doe, Jane Doe]
2   2   John Smith  [John Smith, Jane Smith]
3   2   Jane Smith  [John Smith, Jane Smith]
4   3   Jack Hill   [Jack Hill]
5   4   Jill Hill   [Jill Hill]

如何从多名称中删除括号？

我试过了：

df['Multi Name'] = df['Multi Name'].str.strip('[]')


ID  Name    Multi Name
0   1   John Doe    NaN
1   1   Jane Doe    NaN
2   2   John Smith  NaN
3   2   Jane Smith  NaN
4   3   Jack Hill   NaN
5   4   Jill Hill   NaN

期望的输出：

    ID  Name    Multi Name
0   1   John Doe    John Doe, Jane Doe
1   1   Jane Doe    John Doe, Jane Doe
2   2   John Smith  John Smith, Jane Smith
3   2   Jane Smith  John Smith, Jane Smith
4   3   Jack Hill   Jack Hill
5   4   Jill Hill   Jill Hill

Answer 1

在此处看起来unique是错误的选项。我建议使用str.join：

来定制lambda函数

df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))

df
   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  Jane Smith, John Smith
3   2  Jane Smith  Jane Smith, John Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

Answer 2

`transform`

df.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))

   ID        Name                Multi Name
0   1    John Doe      [John Doe, Jane Doe]
1   1    Jane Doe      [John Doe, Jane Doe]
2   2  John Smith  [John Smith, Jane Smith]
3   2  Jane Smith  [John Smith, Jane Smith]
4   3   Jack Hill               [Jack Hill]
5   4   Jill Hill               [Jill Hill]

df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  John Smith, Jane Smith
3   2  Jane Smith  John Smith, Jane Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

`map`

df.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  John Smith, Jane Smith
3   2  Jane Smith  John Smith, Jane Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

`itertools.groupby`

from itertools import groupby

d = {
    k: ', '.join(x[1] for x in v)
    for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}

df.join(df.ID.map(d).rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      Jane Doe, John Doe
1   1    Jane Doe      Jane Doe, John Doe
2   2  John Smith  Jane Smith, John Smith
3   2  Jane Smith  Jane Smith, John Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

Answer 3

使用s和map：

join

输出：

df['Multi Name'] = df.groupby('ID')['Name'].transform('unique').map(', '.join)

Pandas转换（'unique'）输出为逗号分隔字符串而不是列表

3 个答案:

`transform`

`map`

`itertools.groupby`