我有以下数据框:
id occupations
111 teacher
111 student
222 analyst
333 cook
111 driver
444 lawyer
我创建了一个包含所有职业列表的新列:
new_df['occupation_list'] = df['id'].map(df.groupby('id')['occupations'].agg(list))
如何只在 teacher
中包含 student
和 occupation_list
值?
答案 0 :(得分:1)
您可以在 groupby 之前过滤:
to_map = (df[df['occupations'].isin(['teacher', 'student'])]
.groupby('id')['occupations'].agg(list)
)
df['occupation_list'] = df['id'].map(to_map)
输出:
id occupations occupation_list
0 111 teacher [teacher, student]
1 111 student [teacher, student]
2 222 analyst NaN
3 333 cook NaN
4 111 driver [teacher, student]
5 444 lawyer NaN
答案 1 :(得分:0)
你也可以
df.groupby('id')['occupations'].transform(' '.join).str.split()
答案 2 :(得分:0)
您只需执行 groupby 并将列聚合到列表中即可:
df.groupby('id',as_index=False).agg({'occupations':lambda x: x.tolist()})
出:
>>> df
id occupations
0 111 teacher
1 111 student
2 222 analyst
3 333 cook
4 111 driver
5 444 lawyer
>>> df.groupby('id',as_index=False).agg({'occupations':lambda x: x.tolist()})
id occupations
0 111 [teacher, student, driver]
1 222 [analyst]
2 333 [cook]
3 444 [lawyer]