如何将列表列与熊猫中另一个数据框中的列匹配?

时间:2019-06-07 14:47:01

标签: python pandas merge match

我有一个看起来像这样的数据框:

main_df:

student name | program_ids
-----------------------------
Alex         | [1,2,7]
Tim          | [37]
May          | [17,1,11]
Gloria       | NaN
James        | [37,42]
Nina         | []

prog_df:

 prog_id    | program
 -------------------------
 1          | Arts
 2          | Music
 37         | Languages
 11         | Physics
 17         | Chemistry
 42         | Math
 7          | Dance

我想将“ program_id”列上的“ main_df”与“ prog_df”进行匹配,这样:

我得到一个这样的数据框:

student name | program
-----------------------
Alex         | Arts, Music, Dance
Tim          | Languages
May          | Chemistry, Arts, Physics
Gloria       | NaN
James        | Languages, Math
Nina         | NaN

是否可以将pandas列的列表元素与另一个数据框中的列值进行匹配?

谢谢

2 个答案:

答案 0 :(得分:3)

您可以使用

df1.loc[df1.program_ids.isnull(),'program_ids']=[[]]
d=dict(zip(df2.prog_id,df2.program))
df1['New']=[','.join([d.get(y) for y in x] )for x in df1.program_ids]
df1
Out[15]: 
  studentname  program_ids                     New
0        Alex    [1, 2, 7]        Arts,Music,Dance
1         Tim         [37]               Languages
2         May  [17, 1, 11]  Chemistry,Arts,Physics
3      Gloria           []                        
4       James     [37, 42]          Languages,Math
5        Nina           []                        

答案 1 :(得分:3)

首先,进行一些预处理:

df['program_ids'] = df['program_ids'].map(lambda x: [] if pd.isnull(x) else x)
df

  student name  program_ids
0         Alex    [1, 2, 7]
1          Tim         [37]
2          May  [17, 1, 11]
3       Gloria           []
4        James     [37, 42]
5         Nina           []

接下来,创建程序ID到值的映射:

mapping = dict(prog_df.values)

使用此功能将ID映射到具有列表理解的程序(以提高性能):

df['program_ids']  = [[mapping.get(x) for x in  l] for l in df['program_ids']]
df

  student name                 program_ids
0         Alex        [Arts, Music, Dance]
1          Tim                 [Languages]
2          May  [Chemistry, Arts, Physics]
3       Gloria                          []
4        James           [Languages, Math]
5         Nina                          []

最后,作为可选步骤,使用str.join加入列表:

df['program_ids'].str.join(',').replace('', np.nan)

0          Arts,Music,Dance
1                 Languages
2    Chemistry,Arts,Physics
3                       NaN
4            Languages,Math
5                       NaN
Name: program_ids, dtype: object