我有这3个数据框:
df1
+---+----+--------+
| id|file| status|
+---+----+--------+
| 1| df2|employee|
| 2| df3|employee|
| 3| df2| trainee|
| 4| df3| trainee|
| 5| df3| trainee|
+---+----+--------+
df2
+---+------+----------+
| id|salary|entry_date|
+---+------+----------+
| 1| 4000|06-01-2017|
| 2| 7000|05-03-2015|
| 3| 1500|01-05-2019|
| 4| 1500|01-05-2019|
+---+------+----------+
df3
+---+------+----------+
| id|salary|entry_date|
+---+------+----------+
| 1| 4500|09-01-2016|
| 2| 7000|01-01-2016|
| 3| 1500|05-09-2019|
| 4| 1500|05-04-2019|
| 5| 1300|10-04-2019|
+---+------+----------+
我想加入这些数据框并仅保留正确的列,df1中的“文件”列告诉我们必须保留哪些数据框的列。
结果将是:
+---+----+--------+------+----------+
| id|file| status|salary|entry_date|
+---+----+--------+------+----------+
| 1| df2|employee| 4000|06-01-2017|
| 2| df3|employee| 7000|01-01-2016|
| 3| df2| trainee| 1500|01-05-2019|
| 4| df3| trainee| 1500|05-04-2019|
| 5| df3| trainee| 1300|10-04-2019|
+---+----+--------+------+----------+
我曾考虑过使用withColumn("salary", when('file === "df2", df2("salary")).otherwise(df3("salary")))
这样的示例,但是对于多个数据帧来说将是一团糟。
您知道一种更优雅的方式来获得相同的结果吗?
谢谢