我是火花和斯卡拉的新手。请帮我解决一下这个。
我有下面的输出,我需要生成一个新的数据帧,所有功能都合并而不是单独的列表。此外,我需要将此数据帧附加到另一个数据帧。我怎样才能在scala中执行此操作?
val tab = inter.map(_.groupBy().sum())
tab.map(_.show())
tab: Array[org.apache.spark.sql.DataFrame] = Array([sum(vec_0): double, sum(vec_1): double ... 2 more fields], [sum(vec_0): double, sum(vec_1): double ... 2 more fields])
+------------------+------------------+------------------+------------------+
| sum(vec_0)| sum(vec_1)| sum(vec_2)| sum(vec_3)|
+------------------+------------------+------------------+------------------+
|2.5046410000000003|2.1487149999999997|1.0884870000000002|3.5877090000000003|
+------------------+------------------+------------------+------------------+
+------------------+------------------+----------+------------------+
| sum(vec_0)| sum(vec_1)|sum(vec_2)| sum(vec_3)|
+------------------+------------------+----------+------------------+
|0.9558040000000001|0.9843780000000002| 0.545025|0.9979860000000002|
+------------------+------------------+----------+------------------+
res325: Array[Unit] = Array((), ())
FINISHED
val temp = tab.map(_.alias("t").select(array("t.*") as "List"))
temp.map(_.toDF().show(false))
temp: Array[org.apache.spark.sql.DataFrame] = Array([List: array<double>], [List: array<double>])
+--------------------------------------------------------------------------------+
|List |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
+--------------------------------------------------------------------------------+
+----------------------------------------------------------------------+
|List |
+----------------------------------------------------------------------+
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002]|
+----------------------------------------------------------------------+
res443: Array[Unit] = Array((), ())
val newtable = temp.map(_.toDF("features"))
newtable.map(_.show(false))
newtable: Array[org.apache.spark.sql.DataFrame] = Array([features:
array<double>], [features: array<double>])
+--------------------------------------------------------------------------------+
|features |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
+--------------------------------------------------------------------------------+
+----------------------------------------------------------------------+
|features |
+----------------------------------------------------------------------+
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002]|
+----------------------------------------------------------------------+
res328: Array[Unit] = Array((), ())
预期产出:
+--------------------------------------------------------------------------------+
|features |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002]|
+---------------------------------------------------------------------------------+
答案 0 :(得分:0)
这解决了这个问题。
val fList = newtable.reduce(_.union(_))
newtable.show(false
fList: org.apache.spark.sql.DataFrame = [features: array<double>]
+--------------------------------------------------------------------------------+
|features |
+--------------------------------------------------------------------------------+
|[2.5046410000000003, 2.1487149999999997, 1.0884870000000002, 3.5877090000000003]|
|[0.9558040000000001, 0.9843780000000002, 0.545025, 0.9979860000000002] |
+--------------------------------------------------------------------------------+