我有一个数据集;
>>> all_transcripts
ID Type Name
1 Guest Hugo
1 Guest Hugo
1 Boss Boss
1 Boss Boss
2 Boss Boss
2 Guest Calvin
2 Guest Calvin
3 Guest Klein
3 Boss Boss
现在,我想创建一个名为nameGuest
的列,其中包含每行每个ID的来宾名称。因此,我期望的输出如下所示:
>>> all_transcripts
ID Type Name nameGuest
1 Guest Hugo Hugo
1 Guest Hugo Hugo
1 Boss Boss Hugo
1 Boss Boss Hugo
2 Boss Boss Calvin
2 Guest Calvin Calvin
2 Guest Calvin Calvin
3 Guest Klein Klein
3 Boss Boss Klein
我该怎么做?
答案 0 :(得分:2)
使用Series.map
,boolean indexing
和DataFrame.drop_duplicates
创建的助手Series
的{{3}}来获得每组Guest
的第一个值:>
s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
答案 1 :(得分:1)
Groupby.first
您可以使用groupby
,然后在Type=Guest
上进行过滤,并在聚合时选择first
名称。
这将使我们获得具有相应ID
的名称。因此,我们可以将其映射回我们的数据框并创建新列:
names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()
df['nameGuest'] = df['ID'].map(names)
print(df)
ID Type Name nameGuest
0 1 Guest Hugo Hugo
1 1 Guest Hugo Hugo
2 1 Boss Boss Hugo
3 1 Boss Boss Hugo
4 2 Boss Boss Calvin
5 2 Guest Calvin Calvin
6 2 Guest Calvin Calvin
7 3 Guest Klein Klein
8 3 Boss Boss Klein
names
的输出
print(names)
ID
1 Hugo
2 Calvin
3 Klein
Name: Name, dtype: object