向df1中不存在但df2中存在的数据列添加列

时间:2018-10-22 08:42:28

标签: apache-spark dataframe pyspark

我想向dataframe1(df1)中不存在的列添加到dataframe2(df2)中,并从df2中获取值。 例如

df1:

A |B |C |
---------
ad|bd|cd|
ss|tt|yy|


df2: (only 1 row)
A|B|C|D|E|F|G|
--------------
a|b|c|d|e|f|g|

我想要这个:

df3:
A|B|C|D |E|F|G|
--------------
ad|bd|cd|d|e|f|g|
ss|tt|yy|d|e|f|g|

我该如何迅速做?

谢谢

1 个答案:

答案 0 :(得分:2)

假设df2刚好有1行,则可以像下面这样使用crossJoin

>>> df1.show()
+---+---+---+
|  A|  B|  C|
+---+---+---+
| ad| bd| cd|
| ss| tt| yy|
+---+---+---+

>>> df2.show()
+---+---+---+---+---+---+---+
|  A|  B|  C|  D|  E|  F|  G|
+---+---+---+---+---+---+---+
|  a|  b|  c|  d|  e|  f|  g|
+---+---+---+---+---+---+---+

>>> df3 = df1.crossJoin(df2.drop(*df1.columns))
>>> df3.show()
+---+---+---+---+---+---+---+
|  A|  B|  C|  D|  E|  F|  G|
+---+---+---+---+---+---+---+
| ad| bd| cd|  d|  e|  f|  g|
| ss| tt| yy|  d|  e|  f|  g|
+---+---+---+---+---+---+---+