Question

我有2个这样的DataFrame：

+--+-----------+
|id|some_string|
+--+-----------+
| a|        foo|
| b|        bar|
| c|        egg|
| d|        fog|
+--+-----------+

这：

+--+-----------+
|id|some_string|
+--+-----------+
| a|        hoi|
| b|        hei|
| c|        hai|
| e|        hui|
+--+-----------+

我想加入他们成为这样：

+--+-----------+
|id|some_string|
+--+-----------+
| a|     foohoi|
| b|     barhei|
| c|     egghai|
| d|        fog|
| e|        hui|
+--+-----------+

因此，第一个数据帧中的列some_string对应于第二个数据帧中的列some_string。如果我正在使用

df_join = df1.join(df2,on='id',how='outer')

它将返回

+--+-----------+-----------+
|id|some_string|some_string|
+--+-----------+-----------+
| a|        foo|        hoi|
| b|        bar|        hei|
| c|        egg|        hai|
| d|        fog|       null|
| e|       null|        hui|
+--+-----------+-----------+

有什么办法吗？

Answer 1

您需要检查这两列中的任何一个是Null还是not Null，然后执行concatenation。

outer

Answer 2

考虑要执行外部联接，可以尝试以下操作：

animation.drive(
    Tween(begin: 0.0, end: 1.0), // <-- changed 0.1 to 1.0
  ),

（请注意，some_string1和2指的是df1和df2数据帧中的some_string列。我建议您以不同的名称命名它们，而不要使用相同的名称some_string，以便您可以调用它们）

如何加入两个Spark DataFrame并操作其共享列？

2 个答案: