如何在火花联接期间选择动态列?

时间:2019-02-07 22:17:47

标签: scala apache-spark

我正在尝试连接2个数据帧,在第一个DF中,我需要传递动态列数并将其与另一个DF联接。我在这里面临的复杂性我有一个1st DF输出的case语句。我可以通过创建临时视图来获得所需的输出。但是无法通过spark获得相同的输出。

下面是代码段,我已经尝试过并且可以按预期工作。

// Sample DF1
val studentDF = Seq(
  (1, "Peter","M",15,"Tution Received"),
  (2, "Merry","F",14,null),
  (3, "Sam","M",16,"Tution Received"),
  (4, "Kat","O",16,null),
   (5, "Keivn","M",18,null)
).toDF("Enrollment", "Name","Gender","Age","Notes")

//Sample DF2
val studentFees = Seq((1,"$500","Deposit"),(2, "$800","Deposit"),(3,"$200","Deposit"),(4,"$100","Deposit")).toDF("Enrollment","Fees","Notes")

studentDF.createOrReplaceTempView("STUDENT")
studentFees.createOrReplaceTempView("FEES")

val displayColumns = List("Enrollment","Name","Gender").map("a."+_).reduce(_+","+_)
val queryStr = spark.sql(s"select $displayColumns, case when a.Notes is null then b.Notes else a.Notes end as Notes, b.Fees from STUDENT a join FEES b on a.Enrollment=b.Enrollment")
queryStr.show()

---------+-----+------+---------------+----+
|Enrollment| Name|Gender|          Notes|Fees|
+----------+-----+------+---------------+----+
|         1|Peter|     M|Tution Received|$500|
|         2|Merry|     F|        Deposit|$800|
|         3|  Sam|     M|Tution Received|$200|
|         4|  Kat|     O|        Deposit|$100|
+----------+-----+------+---------------+----+

// Below is not giving the desired output
val displayColumns = List("Enrollment","Name","Gender","Notes")
val queryStr = studentDF.select(displayColumns.head, displayColumns.tail: _*).alias("a").join(studentFees.as("b"),Seq("Enrollment"),"inner").withColumn("Notes",when($"a.Notes".isNull,$"b.Notes").otherwise($"a.Notes"))
queryStr.show()


Enrollment| Name|Gender|          Notes|Fees|          Notes|
+----------+-----+------+---------------+----+---------------+
|         1|Peter|     M|Tution Received|$500|Tution Received|
|         2|Merry|     F|        Deposit|$800|        Deposit|
|         3|  Sam|     M|Tution Received|$200|Tution Received|
|         4|  Kat|     O|        Deposit|$100|        Deposit|
+----------+-----+------+---------------+----+---------------+

// Expecting the output like below. 
---------+-----+------+---------------+----+
|Enrollment| Name|Gender|          Notes|Fees|
+----------+-----+------+---------------+----+
|         1|Peter|     M|Tution Received|$500|
|         2|Merry|     F|        Deposit|$800|
|         3|  Sam|     M|Tution Received|$200|
|         4|  Kat|     O|        Deposit|$100|
+----------+-----+------+---------------+----+

是否有更好的方法来处理这种情况,而不是创建临时表/视图?

1 个答案:

答案 0 :(得分:0)

谢谢所有阅读我的帖子的人!! 我能够找到解决问题的方法。

import java.util.List;
import java.util.ArrayList;
public class UhOh {
    public static void main(String[] args) {
        List<Integer> list = new ArrayList<Integer>();
        WildcardFixed.foo(list);
        list.add(6);
        System.out.println(list); // ¯\_(ツ)_/¯ oh well.
        int i = list.get(0); //if we're going to discuss breaches of contract... :p
    }
}
class WildcardFixed /*not anymore ;) */ { 
    static void foo(List<?> i) {
        fooHelper(i);
    }
    private static <T> void fooHelper(List<T> l) {
        l.add((T)Double.valueOf(2.5));
    }
}