如何按排序顺序组合两个火花数据帧

时间:2017-03-14 23:53:11

标签: scala sorting apache-spark apache-spark-sql

我想将两个数据框ab合并到一个按列排序的数据框c

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num")
val c = // how do I sort on char column?

这是我想要的结果:

 a.show()     b.show()      c.show()
+----+---+   +----+---+    +----+---+
|char|num|   |char|num|    |char|num|
+----+---+   +----+---+    +----+---+
|   a|  1|   |   b|  4|    |   a|  1|
|   c|  2|   |   d|  5|    |   b|  4|
|   e|  3|   +----+---+    |   c|  2|
+----+---+                 |   d|  5|
                           |   e|  3|
                           +----+---+

2 个答案:

答案 0 :(得分:2)

简单来说,您可以在每个数据框和sort()上使用union()

val a = Seq(("a", 1), ("c", 2), ("e", 3)).toDF("char", "num").sort($"char")
val b = Seq(("b", 4), ("d", 5)).toDF("char", "num").sort($"char")

val c = a.union(b).sort($"char")

答案 1 :(得分:0)

如果你想为多个数据帧做联合,我们可以这样试试。

   val df1 = sc.parallelize(List(
  (50, 2, "arjun"),
  (34, 4, "bob")
)).toDF("age", "children","name")

val df2 = sc.parallelize(List(
  (51, 3, "jane"),
  (35, 5, "bob")
)).toDF("age", "children","name")

val df3 = sc.parallelize(List(
  (50, 2,"arjun"),
  (34, 4,"bob")
)).toDF("age", "children","name")


val result= Seq(df1, df2, df3)
val res_union=result.reduce(_ union _).sort($"age",$"name",$"children")
res_union.show()