我有一个如下所示的DataFrame:
df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})
ID Name
0 1 John Doe
1 1 Jane Doe
2 2 John Smith
3 2 Jane Smith
4 3 Jack Hill
5 4 Jill Hill
然后,我按ID添加了另一个列分组并获取了名称中的唯一值:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
如何从多名称中删除括号?
我试过了:
df['Multi Name'] = df['Multi Name'].str.strip('[]')
ID Name Multi Name
0 1 John Doe NaN
1 1 Jane Doe NaN
2 2 John Smith NaN
3 2 Jane Smith NaN
4 3 Jack Hill NaN
5 4 Jill Hill NaN
期望的输出:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
答案 0 :(得分:5)
在此处看起来unique
是错误的选项。我建议使用str.join
:
df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))
df
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
答案 1 :(得分:5)
transform
df.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
map
df.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
itertools.groupby
from itertools import groupby
d = {
k: ', '.join(x[1] for x in v)
for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}
df.join(df.ID.map(d).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe Jane Doe, John Doe
1 1 Jane Doe Jane Doe, John Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
答案 2 :(得分:2)
使用s
和map
:
join
输出:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique').map(', '.join)