多列的条件命名

时间:2019-04-30 08:41:15

标签: python pandas conditional

我有一个数据集;

>>> all_transcripts

ID  Type    Name
1   Guest   Hugo
1   Guest   Hugo   
1   Boss    Boss
1   Boss    Boss
2   Boss    Boss
2   Guest   Calvin
2   Guest   Calvin             
3   Guest   Klein
3   Boss    Boss

现在,我想创建一个名为nameGuest的列,其中包含每行每个ID的来宾名称。因此,我期望的输出如下所示:

>>> all_transcripts

ID  Type    Name     nameGuest
1   Guest   Hugo     Hugo
1   Guest   Hugo     Hugo   
1   Boss    Boss     Hugo
1   Boss    Boss     Hugo
2   Boss    Boss     Calvin
2   Guest   Calvin   Calvin
2   Guest   Calvin   Calvin    
3   Guest   Klein    Klein
3   Boss    Boss     Klein

我该怎么做?

2 个答案:

答案 0 :(得分:2)

使用Series.mapboolean indexingDataFrame.drop_duplicates创建的助手Series的{​​{3}}来获得每组Guest的第一个值:

s = df[df['Type'] == 'Guest'].drop_duplicates('ID').set_index('ID')['Name']
df['nameGuest'] = df['ID'].map(s)
print (df)
   ID   Type    Name nameGuest
0   1  Guest    Hugo      Hugo
1   1  Guest    Hugo      Hugo
2   1   Boss    Boss      Hugo
3   1   Boss    Boss      Hugo
4   2   Boss    Boss    Calvin
5   2  Guest  Calvin    Calvin
6   2  Guest  Calvin    Calvin
7   3  Guest   Klein     Klein
8   3   Boss    Boss     Klein

答案 1 :(得分:1)

Groupby.first

您可以使用groupby,然后在Type=Guest上进行过滤,并在聚合时选择first名称。

这将使我们获得具有相应ID的名称。因此,我们可以将其映射回我们的数据框并创建新列:


names = df[df['Type'] == 'Guest'].groupby('ID')['Name'].first()

df['nameGuest'] = df['ID'].map(names)

print(df)
   ID   Type    Name nameGuest
0   1  Guest    Hugo      Hugo
1   1  Guest    Hugo      Hugo
2   1   Boss    Boss      Hugo
3   1   Boss    Boss      Hugo
4   2   Boss    Boss    Calvin
5   2  Guest  Calvin    Calvin
6   2  Guest  Calvin    Calvin
7   3  Guest   Klein     Klein
8   3   Boss    Boss     Klein

names的输出

print(names)
ID
1      Hugo
2    Calvin
3     Klein
Name: Name, dtype: object