我有两个 pyspark 数据框
df1:
person_id Name serialNo Maritalstatus Location_name
01 abc 10 M America
02 xyz 13 S London
03 def 14 M Europe
04 qwe 15 M Australia
05 asd 16 M Europe
06 fgh 17 M London
07 aka 18 M Australia
08 fgi 19 M London
09 aba 20 M Australia
df2:
Code Location_Name Location_Id
111 Australia AUS
112 America USA
123 London UK
124 Europe EU
我想向 df1 添加一列 Location_Id 以从 df2 获取匹配的 ID,如下所示:
person_id Name serialNo Maritalstatus Location_name Location_Id
01 abc 10 M America USA
02 xyz 13 S London UK
03 def 14 M Europe EU
04 qwe 15 M Australia AUS
05 asd 16 M Europe EU
06 fgh 17 M London UK
07 aka 18 M Australia AUS
08 fgi 19 M London UK
09 aba 20 M Australia AUS
我怎样才能做到这一点?
答案 0 :(得分:3)
只需加入 Location_name
df1.join(df2, on='Location_name')