Pyspark:如何在多个列上联接两个数据框?

时间:2020-09-25 12:16:15

标签: pyspark pyspark-dataframes

我有两个pysparkdf1数据帧

df2

我想加入两个数据框并拥有

df1
       id1   id2    id3    x    y
        0     1      2    0.5  0.4
        2     1      0    0.3  0.2
        3     0      2    0.8  0.9 
        2     1      3    0.2  0.1

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

1 个答案:

答案 0 :(得分:1)

这是多个联接。

df1.join(df2, df1['id1'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n1') \
   .join(df2, df1['id2'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n2') \
   .join(df2, df1['id3'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n3') \
   .show()

+---+---+---+---+---+---+---+---+
|id1|id2|id3|  x|  y| n1| n2| n3|
+---+---+---+---+---+---+---+---+
|  0|  1|  2|0.5|0.4|  A|  B|  C|
|  2|  1|  0|0.3|0.2|  C|  B|  A|
|  3|  0|  2|0.8|0.9|  D|  A|  C|
|  2|  1|  3|0.2|0.1|  C|  B|  D|
+---+---+---+---+---+---+---+---+
相关问题