从基于另一个数据帧的值在 pyspark 数据帧中创建一列

时间:2021-06-03 10:26:39

标签: dataframe pyspark google-cloud-dataproc

我有两个 pyspark 数据框

df1:

person_id   Name  serialNo  Maritalstatus  Location_name

 01         abc      10        M              America    
 02         xyz      13        S              London    
 03         def      14        M              Europe    
 04         qwe      15        M              Australia
 05         asd      16        M              Europe
 06         fgh      17        M              London
 07         aka      18        M              Australia
 08         fgi      19        M              London
 09         aba      20        M              Australia

df2:

Code   Location_Name    Location_Id

111        Australia          AUS    
112        America            USA    
123        London             UK    
124        Europe             EU

我想向 df1 添加一列 Location_Id 以从 df2 获取匹配的 ID,如下所示:

person_id   Name  serialNo  Maritalstatus  Location_name   Location_Id

 01         abc      10        M              America        USA
 02         xyz      13        S              London         UK 
 03         def      14        M              Europe         EU
 04         qwe      15        M              Australia      AUS
 05         asd      16        M              Europe         EU
 06         fgh      17        M              London         UK
 07         aka      18        M              Australia      AUS
 08         fgi      19        M              London         UK
 09         aba      20        M              Australia      AUS

我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:3)

只需加入 Location_name

df1.join(df2, on='Location_name')