我有一个看起来像这样的数据框:
main_df:
student name | program_ids
-----------------------------
Alex | [1,2,7]
Tim | [37]
May | [17,1,11]
Gloria | NaN
James | [37,42]
Nina | []
prog_df:
prog_id | program
-------------------------
1 | Arts
2 | Music
37 | Languages
11 | Physics
17 | Chemistry
42 | Math
7 | Dance
我想将“ program_id”列上的“ main_df”与“ prog_df”进行匹配,这样:
我得到一个这样的数据框:
student name | program
-----------------------
Alex | Arts, Music, Dance
Tim | Languages
May | Chemistry, Arts, Physics
Gloria | NaN
James | Languages, Math
Nina | NaN
是否可以将pandas列的列表元素与另一个数据框中的列值进行匹配?
谢谢
答案 0 :(得分:3)
您可以使用
df1.loc[df1.program_ids.isnull(),'program_ids']=[[]]
d=dict(zip(df2.prog_id,df2.program))
df1['New']=[','.join([d.get(y) for y in x] )for x in df1.program_ids]
df1
Out[15]:
studentname program_ids New
0 Alex [1, 2, 7] Arts,Music,Dance
1 Tim [37] Languages
2 May [17, 1, 11] Chemistry,Arts,Physics
3 Gloria []
4 James [37, 42] Languages,Math
5 Nina []
答案 1 :(得分:3)
首先,进行一些预处理:
df['program_ids'] = df['program_ids'].map(lambda x: [] if pd.isnull(x) else x)
df
student name program_ids
0 Alex [1, 2, 7]
1 Tim [37]
2 May [17, 1, 11]
3 Gloria []
4 James [37, 42]
5 Nina []
接下来,创建程序ID到值的映射:
mapping = dict(prog_df.values)
使用此功能将ID映射到具有列表理解的程序(以提高性能):
df['program_ids'] = [[mapping.get(x) for x in l] for l in df['program_ids']]
df
student name program_ids
0 Alex [Arts, Music, Dance]
1 Tim [Languages]
2 May [Chemistry, Arts, Physics]
3 Gloria []
4 James [Languages, Math]
5 Nina []
最后,作为可选步骤,使用str.join
加入列表:
df['program_ids'].str.join(',').replace('', np.nan)
0 Arts,Music,Dance
1 Languages
2 Chemistry,Arts,Physics
3 NaN
4 Languages,Math
5 NaN
Name: program_ids, dtype: object